Cloud-native, data onboarding architecture for the Google Cloud Public Datasets program

Overview

Public Datasets Pipelines

Cloud-native, data pipeline architecture for onboarding datasets to the Google Cloud Public Datasets Program.

Overview

public-datasets-pipelines

Requirements

Environment Setup

We use Pipenv to make environment setup more deterministic and uniform across different machines.

If you haven't done so, install Pipenv using the instructions found here. Now with Pipenv installed, run the following command:

pipenv install --ignore-pipfile --dev

This uses the Pipfile.lock found in the project root and installs all the development dependencies.

Finally, initialize the Airflow database:

pipenv run airflow initdb

Building Data Pipelines

Configuring, generating, and deploying data pipelines in a programmatic, standardized, and scalable way is the main purpose of this repository.

Follow the steps below to build a data pipeline for your dataset:

1. Create a folder hierarchy for your pipeline

mkdir -p datasets/DATASET/PIPELINE

[example]
datasets/covid19_tracking/national_testing_and_outcomes

where DATASET is the dataset name or category that your pipeline belongs to, and PIPELINE is your pipeline's name.

For examples of pipeline names, see these pipeline folders in the repo.

Use only underscores and alpha-numeric characters for the names.

2. Write your config (YAML) files

If you created a new dataset directory above, you need to create a datasets/DATASET/dataset.yaml config file. See this section for the dataset.yaml reference.

Create a datasets/DATASET/PIPELINE/pipeline.yaml config file for your pipeline. See this section for the pipeline.yaml reference.

If you'd like to get started faster, you can inspect config files that already exist in the repository and infer the patterns from there:

Every YAML file supports a resources block. To use this, identify what Google Cloud resources need to be provisioned for your pipelines. Some examples are

  • BigQuery datasets and tables to store final, customer-facing data
  • GCS bucket to store intermediate, midstream data.
  • GCS bucket to store final, downstream, customer-facing data
  • Sometimes, for very large datasets, you might need to provision a Dataflow job

3. Generate Terraform files and actuate GCP resources

Run the following command from the project root:

$ python scripts/generate_terraform.py \
    --dataset DATASET_DIR_NAME \
    --gcp-project-id GCP_PROJECT_ID \
    --region REGION \
    --bucket-name-prefix UNIQUE_BUCKET_PREFIX \
    [--env] dev \
    [--tf-apply] \
    [--impersonating-acct] IMPERSONATING_SERVICE_ACCT

This generates Terraform files (*.tf) in a _terraform directory inside that dataset. The files contain instrastructure-as-code on which GCP resources need to be actuated for use by the pipelines. If you passed in the --tf-apply parameter, the command will also run terraform apply to actuate those resources.

The --bucket-name-prefix is used to ensure that the buckets created by different environments and contributors are kept unique. This is to satisfy the rule where bucket names must be globally unique across all of GCS. Use hyphenated names (some-prefix-123) instead of snakecase or underscores (some_prefix_123).

In addition, the command above creates a "dot" directory in the project root. The directory name is the value you pass to the --env parameter of the command. If no --env argument was passed, the value defaults to dev (which generates the .dev folder).

Consider this "dot" directory as your own dedicated space for prototyping. The files and variables created in that directory will use an isolated environment. All such directories are gitignored.

As a concrete example, the unit tests use a temporary .test directory as their environment.

4. Generate DAGs and container images

Run the following command from the project root:

$ python scripts/generate_dag.py \
    --dataset DATASET_DIR \
    --pipeline PIPELINE_DIR \
    [--skip-builds] \
    [--env] dev

This generates a Python file that represents the DAG (directed acyclic graph) for the pipeline (the dot dir also gets a copy). To standardize DAG files, the resulting Python code is based entirely out of the contents in the pipeline.yaml config file.

Using KubernetesPodOperator requires having a container image available for use. The command above allows this architecture to build and push it to Google Container Registry on your behalf. Follow the steps below to prepare your container image:

  1. Create an _images folder under your dataset folder if it doesn't exist.

  2. Inside the _images folder, create another folder and name it after what the image is expected to do, e.g. process_shapefiles, read_cdf_metadata.

  3. In that subfolder, create a Dockerfile and any scripts you need to process the data. See the samples/container folder for an example. Use the COPY command in your Dockerfile to include your scripts in the image.

The resulting file tree for a dataset that uses two container images may look like

datasets
└── DATASET
    ├── _images
    │   ├── container_a
    │   │   ├── Dockerfile
    │   │   ├── requirements.txt
    │   │   └── script.py
    │   └── container_b
    │       ├── Dockerfile
    │       ├── requirements.txt
    │       └── script.py
    ├── _terraform/
    ├── PIPELINE_A
    ├── PIPELINE_B
    ├── ...
    └── dataset.yaml

Docker images will be built and pushed to GCR by default whenever the command above is run. To skip building and pushing images, use the optional --skip-builds flag.

5. Declare and set your pipeline variables

Running the command in the previous step will parse your pipeline config and inform you about the templated variables that need to be set for your pipeline to run.

All variables used by a dataset must have their values set in

  [.dev|.test]/datasets/{DATASET}/{DATASET}_variables.json

Airflow variables use JSON dot notation to access the variable's value. For example, if you're using the following variables in your pipeline config:

  • {{ var.json.shared.composer_bucket }}
  • {{ var.json.parent.nested }}
  • {{ var.json.parent.another_nested }}

then your variables JSON file should look like this

{
  "shared": {
    "composer_bucket": "us-east4-test-pipelines-abcde1234-bucket"
  },
  "parent": {
    "nested": "some value",
    "another_nested": "another value"
  }
}

6. Deploy the DAGs and variables

Deploy the DAG and the variables to your own Cloud Composer environment using one of the two commands:

$ python scripts/deploy_dag.py \
  --dataset DATASET \
  --composer-env CLOUD_COMPOSER_ENVIRONMENT_NAME \
  --composer-bucket CLOUD_COMPOSER_BUCKET \
  --composer-region CLOUD_COMPOSER_REGION \
  --env ENV

Testing

Run the unit tests from the project root as follows:

$ pipenv run python -m pytest -v

YAML Config Reference

Every dataset and pipeline folder must contain a dataset.yaml and a pipeline.yaml configuration file, respectively:

Best Practices

  • When running scripts/generate_terraform.py, the argument --bucket-name-prefix helps prevent GCS bucket name collisions because bucket names must be globally unique. Use hyphens over underscores for the prefix and make it as unique as possible, and specific to your own environment or use case.

  • When naming BigQuery columns, always use snake_case and lowercase.

  • When specifying BigQuery schemas, be explicit and always include name, type and mode for every column. For column descriptions, derive it from the data source's definitions when available.

  • When provisioning resources for pipelines, a good rule-of-thumb is one bucket per dataset, where intermediate data used by various pipelines (under that dataset) are stored in distinct paths under the same bucket. For example:

    gs://covid19-tracking-project-intermediate
        /dev
            /preprocessed_tests_and_outcomes
            /preprocessed_vaccinations
        /staging
            /national_tests_and_outcomes
            /state_tests_and_outcomes
            /state_vaccinations
        /prod
            /national_tests_and_outcomes
            /state_tests_and_outcomes
            /state_vaccinations
    
    

    The "one bucket per dataset" rule prevents us from creating too many buckets for too many purposes. This also helps in discoverability and organization as we scale to thousands of datasets and pipelines.

    Quick note: If you can conveniently fit the data in memory, the data transforms are close-to-trivial and are computationally cheap, you may skip having to store mid-stream data. Just apply the transformations in one go, and store the final resulting data to their final destinations.

Comments
  • Feat: Onboard New york taxi trips dataset

    Feat: Onboard New york taxi trips dataset

    Description

    dataset: new_york_taxi_trips pipelines: tlc_green_trips, tlc_yellow_trips

    Checklist

    Note: If an item applies to you, all of its sub-items must be fulfilled

    • [x] (Required) This pull request is appropriately labeled
    • [x] Please merge this pull request after it's approved
    • [x] I'm adding or editing a dataset
      • [ ] The Google Cloud Datasets team is aware of the proposed dataset
      • [ ] I put all my code inside datasets/new_york_taxi_trips> and nothing outside of that directory
    opened by nlarge-google 11
  • feature: Initial implementation for austin_311.311_service_requests

    feature: Initial implementation for austin_311.311_service_requests

    "Pipeline for austin_311.311_Service_Requests"

    Description

    v2 architecture implementation of 311_service_requests in austin, TX. This implements the first version of the csv transform python script.

    Based on #

    Note: It's recommended to open an issue first for context and discussion.

    Checklist

    Note: Delete items below that aren't applicable to your pull request.

    • [ ] Please merge this PR for me once it is approved.
    • [ ] If this PR adds or edits a feature, I have updated the README accordingly.
    • [ ] If this PR adds or edits a dataset or pipeline, it was reviewed and approved by the Google Cloud Public Datasets team beforehand.
    • [ ] If this PR adds or edits a dataset or pipeline, I put all my code inside datasets/<YOUR-DATASET> and nothing outside of that directory.
    • [ ] If this PR adds or edits a dataset or pipeline that I'm responsible for maintaining, my GitHub username is in the CONTRIBUTORS file.
    • [ ] This PR is appropriately labeled.
    cla: yes 
    opened by nlarge-google 11
  • feat: Onboard NOAA

    feat: Onboard NOAA

    Checklist

    Note: Delete items below that aren't applicable to your pull request.

    • [x] Please merge this PR for me once it is approved.
    • [x] If this PR adds or edits a dataset or pipeline, it was reviewed and approved by the Google Cloud Public Datasets team beforehand.
    • [x] If this PR adds or edits a dataset or pipeline, I put all my code inside datasets/noaa and nothing outside of that directory.
    • [x] This PR is appropriately labeled.
    data onboarding cla: yes 
    opened by nlarge-google 7
  • feat: Onboard EPA historical air quality dataset

    feat: Onboard EPA historical air quality dataset

    Description

    Included: Annual summaries CO Daily Summary CO Hourly Summary HAP Daily Summary HAP Hourly Summary Lead Daily Summary NO2 Daily Summary NO2 Hourly Summary NONOxNOy Daily Summary NONOxNOy Hourly Summary Ozone Daily Summary Ozone Hourly Summary PM 10 Daily Summary PM10 Hourly Summary PM25 Frm Hourly Summary PM25 NonFrm Daily Summary PM25 NonFrm Hourly Summary PM25 Speciation Daily Summary PM25 Speciation Hourly Summary Pressure Daily Summary Pressure Hourly Summary RH and DP Daily Summary RH and DP Hourly Summary SO2 Daily Summary SO2 Hourly Summary Temperature Daily Summary Temperature Hourly Summary VOC Daily Summary VOC Hourly Summary Wind Daily Summary Wind Hourly Summary

    Checklist

    Note: Delete items below that aren't applicable to your pull request.

    • [x] Please merge this PR for me once it is approved.
    • [x] If this PR adds or edits a dataset or pipeline, it was reviewed and approved by the Google Cloud Public Datasets team beforehand.
    • [x] If this PR adds or edits a dataset or pipeline, I put all my code inside datasets/epa_historical_air_quality and nothing outside of that directory.
    • [x] This PR is appropriately labeled.
    cla: yes 
    opened by nlarge-google 6
  • feat: Onboard San Francisco Bikeshare Trips

    feat: Onboard San Francisco Bikeshare Trips

    Checklist

    Note: Delete items below that aren't applicable to your pull request.

    • [x] Please merge this PR for me once it is approved.
    • [x] If this PR adds or edits a dataset or pipeline, it was reviewed and approved by the Google Cloud Public Datasets team beforehand.
    • [x] If this PR adds or edits a dataset or pipeline, I put all my code inside san_francisco_bikeshare_trips/bikeshare_trips and nothing outside of that directory.
    • [x] This PR is appropriately labeled.
    cla: yes 
    opened by nlarge-google 6
  • feat: Onboard Census opportunity atlas tract outcomes

    feat: Onboard Census opportunity atlas tract outcomes

    Description

    Tract Outcomes

    Checklist

    Note: Delete items below that aren't applicable to your pull request.

    • [x] Please merge this PR for me once it is approved.
    • [x] If this PR adds or edits a dataset or pipeline, it was reviewed and approved by the Google Cloud Public Datasets team beforehand.
    • [x] If this PR adds or edits a dataset or pipeline, I put all my code inside datasets/census_opportunity_atlas> and nothing outside of that directory.
    • [x] This PR is appropriately labeled.
    opened by nlarge-google 5
  • feat: Onboard Census Bureau International Dataset

    feat: Onboard Census Bureau International Dataset

    Description

    Based on #

    Note: It's recommended to open an issue first for context and discussion.

    Checklist

    Note: Delete items below that aren't applicable to your pull request.

    • [x] Please merge this PR for me once it is approved.

    • [x] If this PR adds or edits a dataset or pipeline, it was reviewed and approved by the Google Cloud Public Datasets team beforehand.

    • [x] If this PR adds or edits a dataset or pipeline, I put all my code inside datasets/census_bureau_international and nothing outside of that directory.

    • [x] This PR is appropriately labeled.

    data onboarding cla: yes 
    opened by vasuc-google 5
  • Containerize custom tasks

    Containerize custom tasks

    Note: The following is taken from @tswast's recommendation on a separate thread.

    What are you trying to accomplish?

    One of the Airflow "gotchas" is that workers share resources with the scheduler, so any "real work" that uses CPU and/or memory can cause slowdowns in the scheduler or even instability if memory is used up.

    The recommendation is to do any "real work" in one of:

    What challenges are you running into?

    In the generated DAG, I see the following operator:

        # Run the custom/csv_transform.py script to process the raw CSV contents into a BigQuery friendly format
        process_raw_csv_file = bash_operator.BashOperator(
            task_id="process_raw_csv_file",
            bash_command="SOURCE_CSV=$airflow_home/data/$dataset/$pipeline/{{ ds }}/raw-data.csv TARGET_CSV=$airflow_home/data/$dataset/$pipeline/{{ ds }}/data.csv python $airflow_home/dags/$dataset/$pipeline/custom/csv_transform.py\n",
            env={'airflow_home': '{{ var.json.shared.airflow_home }}', 'dataset': 'covid19_tracking', 'pipeline': 'city_level_cases_and_deaths'},
        )
    

    I haven't looked closely at the csv_transform.py script yet, but I'd expect it to use non-trivial CPU / memory resources.

    For custom Python scripts such as this, I'd expect us to use the KubernetesPodOperator, where the work is scheduled on a separate node pool.

    Checklist

    • [x] I created this issue in accordance with the Code of Conduct.
    • [x] This issue is appropriately labeled.
    feature request 
    opened by adlersantos 5
  • Feat: Onboard Mimiciii dataset

    Feat: Onboard Mimiciii dataset

    Description

    This is to onboard mimiciii dataset with 25 pipelines using Airflow v2 operators only.

    Checklist

    • [x] (Required) This pull request is appropriately labeled
    • [x] Please merge this pull request after it's approved

    Use the sections below based on what's applicable to your PR and delete the rest:

    Feature

    • [ ] I'm adding or editing a feature
    • [ ] I have updated the README accordingly
    • [ ] I have added/revised tests for the feature

    Data Onboarding

    • [x] I'm adding or editing a dataset
    • [x] The Google Cloud Datasets team is aware of the proposed dataset
    • [x] I put all my code inside datasets/mimiciii and nothing outside of that directory

    Code cleanup or refactoring

    • [x] I'm refactoring or cleaning up some code
    data onboarding 
    opened by Naveen130 4
  • Refactor: Combine New York pipelines into one

    Refactor: Combine New York pipelines into one

    Description

    These are changes and clean-up to the existing dataset pipelines for new-york

    311_service_requests citibike_stations nypd_mv_collisions

    Checklist

    Note: If an item applies to you, all of its sub-items must be fulfilled

    • [x] (Required) This pull request is appropriately labeled
    • [x] Please merge this pull request after it's approved
    • [x] I'm adding or editing a dataset
      • [x] The Google Cloud Datasets team is aware of the proposed dataset
      • [x] I put all my code inside datasets/new_york and nothing outside of that directory
    • [x] I'm refactoring or cleaning up some code
    opened by nlarge-google 4
  • Feat: Onboard SEC Failure to Deliver dataset

    Feat: Onboard SEC Failure to Deliver dataset

    Checklist

    Note: If an item applies to you, all of its sub-items must be fulfilled

    • [x] (Required) This pull request is appropriately labeled
    • [x] Please merge this pull request after it's approved
    • [x] I'm adding or editing a dataset
      • [x] The Google Cloud Datasets team is aware of the proposed dataset
      • [x] I put all my code inside datasets/sec_failure_to_deliver and nothing outside of that directory pipelines/tree/main/tests) folder)
    • [x] I'm refactoring or cleaning up some code
    opened by nlarge-google 4
  • chore(deps): update dependency black to v22.12.0

    chore(deps): update dependency black to v22.12.0

    Mend Renovate

    This PR contains the following updates:

    | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | black (changelog) | ==22.10.0 -> ==22.12.0 | age | adoption | passing | confidence |


    Release Notes

    psf/black

    v22.12.0

    Compare Source

    Preview style
    • Enforce empty lines before classes and functions with sticky leading comments (#​3302)
    • Reformat empty and whitespace-only files as either an empty file (if no newline is present) or as a single newline character (if a newline is present) (#​3348)
    • Implicitly concatenated strings used as function args are now wrapped inside parentheses (#​3307)
    • Correctly handle trailing commas that are inside a line's leading non-nested parens (#​3370)
    Configuration
    • Fix incorrectly applied .gitignore rules by considering the .gitignore location and the relative path to the target file (#​3338)
    • Fix incorrectly ignoring .gitignore presence when more than one source directory is specified (#​3336)
    Parser
    • Parsing support has been added for walruses inside generator expression that are passed as function args (for example, any(match := my_re.match(text) for text in texts)) (#​3327).
    Integrations
    • Vim plugin: Optionally allow using the system installation of Black via let g:black_use_virtualenv = 0(#​3309)

    Configuration

    📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

    🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

    Rebasing: Never, or you tick the rebase/retry checkbox.

    🔕 Ignore: Close this PR and you won't be reminded about these updates again.


    • [ ] If you want to rebase/retry this PR, check this box

    This PR has been generated by Mend Renovate. View repository job log here.

    dependencies 
    opened by renovate-bot 0
  • Fix: Onboard HRRR processes in NOAA ETL

    Fix: Onboard HRRR processes in NOAA ETL

    Description

    Notes:

    • If you are adding or editing a dataset, please specify the dataset folder involved, e.g. datasets/google_trends.
    • If you are an external contributor, please contact the Google Cloud Datasets team for your proposed dataset or feature.
    • If you are adding or editing a dataset, please do it one dataset at a time. Have all the code changes inside a single datasets/noaa folder.
    opened by nlarge-google 0
  • chore(deps): update dependency pandas-gbq to v0.18.0

    chore(deps): update dependency pandas-gbq to v0.18.0

    Mend Renovate

    This PR contains the following updates:

    | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | pandas-gbq | ==0.17.9 -> ==0.18.0 | age | adoption | passing | confidence |


    Release Notes

    googleapis/python-bigquery-pandas

    v0.18.0

    Compare Source

    Features
    • Map "if_exists" value to LoadJobConfig.WriteDisposition (#​583) (7389cd2)

    Configuration

    📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

    🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

    Rebasing: Never, or you tick the rebase/retry checkbox.

    🔕 Ignore: Close this PR and you won't be reminded about these updates again.


    • [ ] If you want to rebase/retry this PR, check this box

    This PR has been generated by Mend Renovate. View repository job log here.

    dependencies 
    opened by renovate-bot 1
  • chore(deps): update dependency flake8 to v6

    chore(deps): update dependency flake8 to v6

    Mend Renovate

    This PR contains the following updates:

    | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | flake8 (changelog) | ==5.0.4 -> ==6.0.0 | age | adoption | passing | confidence |


    Release Notes

    pycqa/flake8

    v6.0.0

    Compare Source


    Configuration

    📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

    🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

    Rebasing: Never, or you tick the rebase/retry checkbox.

    🔕 Ignore: Close this PR and you won't be reminded about these updates again.


    • [ ] If you want to rebase/retry this PR, check this box

    This PR has been generated by Mend Renovate. View repository job log here.

    dependencies 
    opened by renovate-bot 0
  • Feat: Onboard Af dag notifications

    Feat: Onboard Af dag notifications

    Description

    Notes:

    • If you are adding or editing a dataset, please specify the dataset folder involved, e.g. datasets/google_trends.
    • If you are an external contributor, please contact the Google Cloud Datasets team for your proposed dataset or feature.
    • If you are adding or editing a dataset, please do it one dataset at a time. Have all the code changes inside a single datasets/af_dag_notifications folder.
    opened by nlarge-google 0
Releases(v5.2.0)
This repository contains free labs for setting up an entire workflow and DevOps environment from a real-world perspective in AWS

DevOps-The-Hard-Way-AWS This tutorial contains a full, real-world solution for setting up an environment that is using DevOps technologies and practic

Mike Levan 1.6k Jan 05, 2023
Student-Management-System-in-Python - Student Management System in Python

Student-Management-System-in-Python Student Management System in Python

G.Niruthian 3 Jan 01, 2022
Набор утилит для Discord с использованием языка программирования Python.

Discord Tools v0.1 Functions: WebHook spamer Spotify account generator (What?) QR Code Token stealer Token generator Discord nitro gen/check Discor to

Максим Скризов 3 Aug 23, 2022
A Telegram bot that add a dynamic caption to musics

Music Channel Manager A Telegram bot that add a dynamic caption to musics Deploy to Heroku What is it ? It manage your music channel. With just adding

13 Oct 18, 2022
A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel

CovidLive AU Summary Slackbot This bot is a very simple slackbot that pulls data, summarises and posts up to date AU COVID stats to a provided slack c

James 3 Dec 18, 2021
Group Management Bot

❤️ 𝗦𝗛𝗔𝗗𝗜𝗬𝗢 ❤️ A Powerful, Smart And Advance Group Manager ... Written with AioGram , Pyrogram and Telethon... ⭐️ Thanks to everyone who starred

Abdisamad Omar Mohamed 4 Dec 01, 2021
Coinbase Pro API interface framework and tooling

neutrino This project has just begun. Rudimentary API documentation Installation Prerequisites: Python 3.8+ and Git 2.33+ Navigate into a directory of

Joshua Chen 1 Dec 26, 2021
GitHub Actions Poll Mode AutoScaler (GAPMAS)

GitHub Actions Poll Mode AutoScaler, or GAPMAS, is a simple tool that helps you run ephemeral GitHub Actions self-hosted runners on your own infrastructure.

Frode Nordahl 4 Nov 04, 2022
Balsam Python client API & SDK

balsam No description provided (generated by Openapi Generator https://github.com/openapitools/openapi-generator) This Python package is automatically

Darren Govoni 1 Oct 22, 2021
Hacktoberfest2021 - Submit Just 4 PRs to earn SWAGS and Tshirts🔥

dont contribute in this repo, contribute only in below mentioned repo Special Note For Everyone ''' always make more then 4 pull request lets you have

Keshav Singh 820 Jan 02, 2023
Telegram Google Translater Bot

Google-Translater-Bot Hey Mo Tech, I am simple Google Translater Bot. I can translate any language to you selected language Team Mo Tech Deploy To Her

21 Dec 01, 2022
✨ A Telegram mirror/leech bot By SparkXcloud Group ✨

SparkXcloud-Gdrive-MirrorBot SparkXcloud-Gdrive-MirrorBot is a multipurpose Telegram Bot writen in Python for mirroring files on the Internet to our b

119 Oct 23, 2022
A Python library for PagerDuty.

Pygerduty Python Library for PagerDuty's REST API and Events API. This library was originally written to support v1 and is currently being updated to

Dropbox 164 Dec 20, 2022
A Dm Bot, also knows as Mass DM bot which can send one message to All of the Users in a Specific Server!

Discord DM Bot discord.py 1.7.2 python 3.9.5 asyncio 3.4.3 Installation Cloud Host Tutorial uploaded in YouTube, watch it by clicking here. Local Host

hpriyam8 7 Mar 24, 2022
Davide Gallitelli 3 Dec 21, 2021
Discord bot template.py

discord_bot_template.py A minimal and open-source discord.py boilerplate for kick-starting bot projects. I spend a lot of time developing bots for dif

Tarran Prior 1 Feb 24, 2022
This is a TG Video Compress BoT. Product by BINARY Tech

🌀 Video Compressor Bot Product by BINARY Tech Deploy to Heroku The Hard Way virtualenv -p python3 VENV . ./VENV/bin/activate pip install -r requireme

1 Jan 04, 2022
Framework for Telegram users and chats investigating.

telegram_scan Fantastic and full featured framework for Telegram users and chats investigating. Prerequisites: pip3 install pyrogram; get api_id and a

71 Dec 17, 2022
Telegram vc - A bot that can play music on telegram group's voice call

Telegram Voice Chat Bot A bot that can play music on telegram group's voice call

1 Jan 02, 2022
A simple Telegram bot that converts a phone number to a direct whatsapp chat link

Open in WhatsApp I was using a great app to open a whatsapp chat with a given number directly without saving that number in my contact list, but I fel

Pathfinder 19 Dec 24, 2022