This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

Last update: Jan 10, 2022

Related tags

Web Crawling Website-Crawler-Python-

Overview

Website-Crawler-Python

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address. After getting the website address, it asks for how much crawling depth the user wants in between the number of links has been found after providing the website address.

Website Crawler takes 3 inputs:

A website address
Integer value for the crawling depth
A user specified regular expression to find user specific data

General tasks:

Find all the Nowgegian mobile numbers and saves into a text file.
Find all the sub-links inside the given website and saves into a text file.
Saves the website's raw HTML code into a text file.
Find all email addresses and save into a text file.
Find all the comments used in the website and saves it into a text file.
Find five most used words and print it into the terminal.

This is a Python based project and used some dependent libraries to execute the functionalities.

RegEx
Urllib3
BeautifulSoup 4
Counter in Collections

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

Related tags

Overview

Website-Crawler-Python

Owner

Faisal Ahmed

Command line program to download documents from web portals.

Scrap the 42 Intranet's elearning videos in a single click

fork huanghyw/jd_seckill

一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件

Audio media crawler for lbry.

A Python package that scrapes Google News article data while remaining undetected by Google.

淘宝、天猫半价抢购，抢电视、抢茅台，干死黄牛党

A web scraper that exports your entire WhatsApp chat history.

京东抢茅台，秒杀成功很多次讨论，天猫抢购，赚钱交流等。

A scalable frontier for web crawlers

用python爬取江苏几大高校的就业网站，并提供3种方式通知给用户，分别是通过微信发送、命令行直接输出、windows气泡通知。

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

An helper library to scrape data from TikTok in one line, using the Influencer Hunters APIs.

中国大学生在线四史自动答题刷分(现仅支持英雄篇)

A module for CME that spiders hashes across the domain with a given hash.

👁️ Tool for Data Extraction and Web Requests.

Python web scrapper

Web crawling framework based on asyncio.

Dex-scrapper - Hobby project for scrapping dex data on VeChain

Iptvcrawl - A scrapy project for crawl IPTV playlist

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

Related tags

Overview

Website-Crawler-Python

Owner

Faisal Ahmed

Command line program to download documents from web portals.

Scrap the 42 Intranet's elearning videos in a single click

fork huanghyw/jd_seckill

一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件

Audio media crawler for lbry.

A Python package that scrapes Google News article data while remaining undetected by Google.

淘宝、天猫半价抢购，抢电视、抢茅台，干死黄牛党

A web scraper that exports your entire WhatsApp chat history.

京东抢茅台，秒杀成功很多次讨论，天猫抢购，赚钱交流等。

A scalable frontier for web crawlers

用python爬取江苏几大高校的就业网站，并提供3种方式通知给用户，分别是通过微信发送、命令行直接输出、windows气泡通知。

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

An helper library to scrape data from TikTok in one line, using the Influencer Hunters APIs.

中国大学生在线 四史自动答题刷分(现仅支持英雄篇)

A module for CME that spiders hashes across the domain with a given hash.

👁️ Tool for Data Extraction and Web Requests.

Python web scrapper

Web crawling framework based on asyncio.

Dex-scrapper - Hobby project for scrapping dex data on VeChain

Iptvcrawl - A scrapy project for crawl IPTV playlist

中国大学生在线四史自动答题刷分(现仅支持英雄篇)