Automatically detect changes made to the official Telegram sites.

Overview

🕷 Telegram Web Crawler

This project is developed to automatically detect changes made to the official Telegram sites. This is necessary for anticipating future updates and other things (new vacancies, API updates, etc).

Name Commits Status
Site updates tracker Commits Fetch new content of tracked links to files
Site links tracker Commits Generate or update list of tracked links
  • passing – new changes
  • failing – no changes

You should to subscribe to channel with alerts to stay updated. Copy of Telegram websites stored here.

GitHub pretty diff

How it works

  1. Link crawling runs as often as possible. Starts crawling from the home page of the site. Detects relative and absolute sub links and recursively repeats the operation. Writes a list of unique links for future content comparison. Additionally, there is the ability to add links by hand to help the script find more hidden (links to which no one refers) links. To manage exceptions, there is a system of rules for the link crawler.

  2. Content crawling is launched as often as possible and uses the existing list of links collected in step 1. Going through the base it gets contains and builds a system of subfolders and files. Removes all dynamic content from files.

  3. Using of GitHub Actions. Works without own servers. You can just fork this repository and own tracker system by yourself. Workflows launch scripts and commit changes. All file changes are tracked by the GIT and beautifully displayed on the GitHub. GitHub Actions should be built correctly only if there are changes on the Telegram website. Otherwise, the workflow should fail. If build was successful, we can send notifications to Telegram channel and so on.

FAQ

Q: How often is "as often as possible"?

A: TLTR: content update action runs every ~10 minutes. More info:

Q: Why there is 2 separated crawl scripts instead of one?

A: Because the previous idea was to update tracked links once at hour. It was so comfortably to use separated scripts and workflows. After Telegram 7.7 update, I realised that find new blog posts so slowly is bad idea.

Q: Why alert for sending alerts have while loop?

A: Because GitHub API doesn't return information about commit immediately after push to repository. Therefore, script are waiting for information to appear...

Q: Why are you using GitHab Personal Access Token in action/checkout workflow`s step?

A: To have ability to trigger other workflows by on push trigger. More info:

Q: Why are you using GitHab PAT in make_and_send_alert.py?

A: To increase limits of GitHub API.

TODO list

  • add storing history of content using hashes;
  • add storing hashes of image, svg, video.

Example of link crawler rules configuration

CRAWL_RULES = {
    # every rule is regex
    # empty string means match any url
    # allow rules with higher priority than deny
    'translations.telegram.org': {
        'allow': {
            r'^[^/]*$',  # root
            r'org/[^/]*/$',  # 1 lvl sub
            r'/en/[a-z_]+/$'  # 1 lvl after /en/
        },
        'deny': {
            '',  # all
        }
    },
    'bugs.telegram.org': {
        'deny': {
            '',    # deny all sub domain
        },
    },
}

Current hidden urls list

HIDDEN_URLS = {
    # 'corefork.telegram.org', # disabled

    'telegram.org/privacy/gmailbot',
    'telegram.org/tos',
    'telegram.org/tour',
    'telegram.org/evolution',

    'desktop.telegram.org/changelog',
}

License

Licensed under the MIT License.

Owner
Il'ya
Telegram: https://t.me/MarshalX
Il'ya
A simple MTProto-based bot that can download various types of media (>10MB) on a local storage

TG Media Downloader Bot 🤖 A telegram bot based on Pyrogram that downloads on a local storage the following media files: animation, audio, document, p

Alessio Tudisco 11 Nov 01, 2022
TwitchAccountMaker - Twitch Account Maker with python

Twitch Account Creator A Twitch Account Creator, Requires Capmonster.cloud Verif

vanis / 1800 0 Jan 20, 2022
Tesseract Open Source OCR Engine (main repository)

Tesseract OCR About This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM

48.3k Jan 05, 2023
D(HE)ater is a security tool can perform DoS attack by enforcing the DHE key exchange.

D(HE)ater D(HE)ater is an attacking tool based on CPU heating in that it forces the ephemeral variant of Diffie-Hellman key exchange (DHE) in given cr

Balasys 138 Dec 15, 2022
You can connect with Sanila Ranatunga using this bot😉😉

Sanila-Ranatunga-s-Assistant-Bot You can connect with Sanila Ranatunga using this bot 😉 😉 Reach me on Telegram Sanila's Assistant Bot What is Telegr

Sanila Ranatunga 5 Feb 01, 2022
Declarative assertions for AWS

AWSsert AWSsert is a Python library providing declarative assertions about AWS resources to your tests. Installation Use the package manager pip to in

19 Jan 04, 2022
Integrating Amazon API Gateway private endpoints with on-premises networks

Integrating Amazon API Gateway private endpoints with on-premises networks Read the blog about this application: Integrating Amazon API Gateway privat

AWS Samples 12 Sep 09, 2022
This is a python wrapper for "the best api in the world"

This is a python wrapper for my api api_url = "https://api.dhravya.me/" This wrapper now has async support, its basically the same except it uses asyn

Dhravya Shah 3 Dec 21, 2021
• Create Your Own YouTube Info Api.

youtube_data_api • Create Your Own YouTube Info Api. Deploy How to Use https://{ Heroku App Name }.herokuapp.com/api?link={YouTube link} In local Host

lokaman chendekar 12 Oct 02, 2022
Discord Auto bumper made in python, just a simple auto bumper that I made.

Discord Auto bumper made in python, just a simple auto bumper that I made.

XPTGR 0 Dec 04, 2021
Bot for automated buying boxes on Binance

V 1.0 Bot for automated buying boxes on Binance В settings.py выставляем свои COOKIE и свой CSRFTOKEN В settings.py для headers выставляем свои параме

Matvey 3 Jan 18, 2022
eBay Scraper Homework 3 With Python

eBay Scraper Homework 3 Description of Code My ebay-dl.py file is programmed with python to download 6 key pieces of information - name, if there are

1 Nov 10, 2021
trackbranch is a tool for developers that can be used to store collections of branches in the form of profiles.

trackbranch trackbranch is a tool for developers that can be used to store collections of branches in the form of profiles. This can be useful for sit

Kevin Morris 1 Oct 21, 2021
Create custom Vanity URLs for Discord without 30 boosts

CustomVanity - Made by udp#6666 aka Apolo - OpenSource Custom Discord Vanity Creator How To Use Open CustomVanity.py Write your server invite code Wri

apolo 17 Aug 23, 2022
Powerful Telegram userbot to turn your PROFILE PICTURE & LAST NAME into a real time clock & to change your BIO automatically.

DATE_TIME_USERBOT-TeLeTiPs Powerful Telegram userbot to turn your PROFILE PICTURE & LAST NAME into a real time clock & to change your BIO automaticall

53 Jan 05, 2023
DadBot 2.0 is an interactive bot that reflects the personality of a typical dad

DadBot 2.0 is an interactive bot that reflects the personality of a typical dad! You can start by running main while all directories are maintained as they are on this GitHub.

1 Dec 04, 2021
Bezlik Year Calendar Planner

Bezlik Year Calendar Planner Scribus script for creating year planners on one page A1 paper format. Script is based on Year-Calendar-Script-for-Scribu

Bohdan Bobrowski 2 May 24, 2022
A simple python oriented telegram bot to give out creative font style's

Font-Bot A simple python oriented telegram bot to give out creative font style's REQUIREMENTS tgcrypto pyrogram==1.2.9 Installation Fork this reposito

BL4CK H47 4 Jan 30, 2022
A simple and easy to use musicbot in python and it uses lavalink.

Lavalink-MusicBot A simple and easy to use musicbot in python and it uses lavalink. ✨ Features plays music in your discord server well thats it i gues

Afnan 1 Nov 29, 2021
Trabalho N1 para a materia Tecnicas de Progamação da Anhembi Morumbi

Projeto da Anhembi Morumbi - Tecnicas de Programação. RPG de Console (CMD) Trabalho proposto pelo professor André Santana, na materia Tecnicas de Prog

Leonardo Silva M de Barros 3 Sep 12, 2021