Screen scraping and web crawling framework

Last update: Jun 21, 2021

Overview

Pomp

Pomp is a screen scraping and web crawling framework. Pomp is inspired by and similar to Scrapy, but has a simpler implementation that lacks the hard Twisted dependency.

Features:

Pure python
Only one dependency for Python 2.x - concurrent.futures (backport of package for Python 2.x)
Supports one file applications; Pomps doesn't force a specific project layout or other restrictions.
Pomp is a meta framework like Paste: you may use it to create your own scraping framework.
Extensible networking: you may use any sync or async method.
No parsing libraries in the core; use you preferred approach.
Pomp instances may be distributed and are designed to work with an external queue.

Pomp makes no attempt to accomodate:

redirects
proxies
caching
database integration
cookies
authentication
etc.

If you want proxies, redirects, or similar, you may use the excellent requests library as the Pomp downloader.

Pomp examples

Pomp docs

Pomp is written and maintained by Evgeniy Tatarkin and is licensed under the BSD license.

Screen scraping and web crawling framework

Related tags

Overview

Pomp

Owner

Evgeniy Tatarkin

A scalable frontier for web crawlers

Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

tweet random sand cat pictures

Dailyiptvlist.com Scraper With Python

Audio media crawler for lbry.

:arrow_double_down: Dumb downloader that scrapes the web

🤖 Threaded Scraper to get discord servers from disboard.org written in python3

Crawl the information of a given keyword on Google search engine

jd_maotai rpa 基于selenium驱动的jd抢购rpa机器人

Web crawling framework based on asyncio.

Dude is a very simple framework for writing web scrapers using Python decorators

A high-level distributed crawling framework.

Here I provide the source code for doing web scraping using the python library, it is Selenium.

A web scraper which checks price of a product regularly and sends price alerts by email if price reduces.

An Web Scraping API for MDL(My Drama List) for Python.

Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

自动完成每日体温上报（Github Actions）

Binance Smart Chain Contract Scraper + Contract Evaluator

The first public repository that provides free BUBT website scraping API script on Github.