A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

Last update: Dec 13, 2021

Overview

combined-shop-scraper

A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items.

Features

Define an input file components.json with components to be scraped and the source urls
Find the cheapest order combination including the shipping prices
Get alarm prices when single components are below a defined price
Easily expand for new shops (scraping basic know-how required). Default basic support for notebooksbilliger, cyberport and future-x

Usage

JSON file definition

The default name of the input JSON file is components.json and must be located in the same folder as scraper.py. This is the basic structure of the file:

{
  "component1": {
    "alarm_price": 260,
    "quantity": 1,
    "urls": [
      "https://www.someshop.com/component1",
      "https://www.someshop.com/component1-alternative",
      "https://www.anothershop.com/component1-alternative"]
  },
  "component2": {
    "urls": [
      "https://www.someshop.com/component2",
      "https://www.anothershop.com/component2",
      "https://www.onemoreshop.com/component2"]
  }

The component name and at least one url are mandatory. It is possible to add several urls from the same shop for the same component if there are some alternatives for this. The quantity of each component defaults to 1, the alarm price is optional.

Execution

Just call the script scraper.py from within the folder, so the components.json file can be found. It will print an overview of the ideal order to minimize the overall cost. The program runs just once and does not keep tracking prices in the background. As usual with scraping, be gentle and fair and don't abuse this program.

Addition of new shops

If you want to add a new shop, you need to edit the file shops.py and:

Enter the significant part of the shop url in the method Shop._get_shops_dict and define a new class type (child of Shop)
Implement the methods _process_soup and get_shipping_cost for the new class. Use the existing classes as reference for the data you need to scrap.
Add your new urls to the input file!

License

See the LICENSE for license details.

A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

Related tags

Overview

combined-shop-scraper

Features

Usage

JSON file definition

Execution

Addition of new shops

License

Owner

👁️ Tool for Data Extraction and Web Requests.

Example of scraping a paginated API endpoint and dumping the data into a DB

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸每日一句 + 毒鸡汤（从2月份稳定运行至今）

Screen scraping and web crawling framework

Amazon web scraping using Scrapy Framework

Async Python 3.6+ web scraping micro-framework based on asyncio

LSpider 一个为被动扫描器定制的前端爬虫

Scrapes Every Email Address of Every Society in Every University

Telegram Group Scrapper

A database scraper created with mechanical soup and sqlite

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

12306抢票脚本

Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine .

Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

Kusonime scraper using python3

EBay-email-tracker - Scapes an entire search page of a particular item on eBay and sends regular updates to an email address

A simple python script to fetch the latest covid info

tweet random sand cat pictures

A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

Related tags

Overview

combined-shop-scraper

Features

Usage

JSON file definition

Execution

Addition of new shops

License

Owner

👁️ Tool for Data Extraction and Web Requests.

Example of scraping a paginated API endpoint and dumping the data into a DB

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸 每日一句 + 毒鸡汤（从2月份稳定运行至今）

Screen scraping and web crawling framework

Amazon web scraping using Scrapy Framework

Async Python 3.6+ web scraping micro-framework based on asyncio

LSpider 一个为被动扫描器定制的前端爬虫

Scrapes Every Email Address of Every Society in Every University

Telegram Group Scrapper

A database scraper created with mechanical soup and sqlite

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

12306抢票脚本

Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine .

Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

Kusonime scraper using python3

EBay-email-tracker - Scapes an entire search page of a particular item on eBay and sends regular updates to an email address

A simple python script to fetch the latest covid info

tweet random sand cat pictures

python+selenium实现的web端自动打卡 + 每日邮件发送 + 金山词霸每日一句 + 毒鸡汤（从2月份稳定运行至今）