A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    Minecraft Item Scraper

    Minecraft Item Scraper To run, first ensure you have the BeautifulSoup module: pip install bs4 Then run, python minecraft_items.py folder-to-save-ima

    Jaedan Calder 1 Dec 29, 2021
    京东云无线宝积分推送,支持查看多设备积分使用情况

    JDRouterPush 项目简介 本项目调用京东云无线宝API,可每天定时推送积分收益情况,帮助你更好的观察主要信息 更新日志 2021-03-02: 查询绑定的京东账户 通知排版优化 脚本检测更新 支持Server酱Turbo版 2021-02-25: 实现多设备查询 查询今

    雷疯 199 Dec 12, 2022
    Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

    Scrapy Cluster This Scrapy project uses Redis and Kafka to create a distributed

    Hanh Pham Van 0 Jan 06, 2022
    Footballmapies - Football mapies for learning webscraping and use of gmplot module in python

    Footballmapies - Football mapies for learning webscraping and use of gmplot module in python

    1 Jan 28, 2022
    Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

    Alpha Swap English This is a simple python tool for the purpose of swapping latinic letters with cirylic ones and vice versa, in txt, docx and pdf fil

    Aleksandar Damnjanovic 3 May 31, 2022
    Scraping script for stats on covid19 pandemic status in Chiba prefecture, Japan

    About 千葉県の地域別の詳細感染者統計(Excelファイル) をCSVに変換し、かつ地域別の日時感染者集計値を出力するスクリプトです。 Requirement POSIX互換なシェル, e.g. GNU Bash (1) curl (1) python = 3.8 pandas = 1.1.

    Conv4Japan 1 Nov 29, 2021
    Google Developer Profile Badge Scraper

    Google Developer Profile Badge Scraper It is a Google Developer Profile Web Scraper which scrapes for specific badges in a user's Google Developer Pro

    Hemant Sachdeva 2 Feb 22, 2022
    爬取各大SRC当日公告 | 通过微信通知的小工具 | 赏金工具

    OnTimeHacker V1.0 OnTimeHacker 是一个爬取各大SRC当日公告,并通过微信通知的小工具 OnTimeHacker目前版本为1.0,已支持24家SRC,列表如下 360、爱奇艺、阿里、百度、哔哩哔哩、贝壳、Boss、58、菜鸟、滴滴、斗鱼、 饿了么、瓜子、合合、享道、京东、

    Bywalks 95 Jan 07, 2023
    Command line program to download documents from web portals.

    command line document download made easy Highlights list available documents in json format or download them filter documents using string matching re

    16 Dec 26, 2022
    Web-scraping - Program that scrapes a website for a collection of quotes, picks one at random and displays it

    web-scraping Program that scrapes a website for a collection of quotes, picks on

    Manvir Mann 1 Jan 07, 2022
    Snowflake database loading utility with Scrapy integration

    Snowflake Stage Exporter Snowflake database loading utility with Scrapy integration. Meant for streaming ingestion of JSON serializable objects into S

    Oleg T. 0 Dec 06, 2021
    VG-Scraper is a python program using the module called BeautifulSoup which allows anyone to scrape something off an website. This program lets you put in a number trough an input and a number is 1 news article.

    VG-Scraper VG-Scraper is a convinient program where you can find all the news articles instead of finding one yourself. Installing [Linux] Open a term

    3 Feb 13, 2022
    Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

    Shopee Scraper A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil. The project was crea

    Paulo DaRosa 5 Nov 29, 2022
    An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!

    Social Media Scraper An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line! Go to the website » Vie

    2 Aug 03, 2022
    An application that on a given url, crowls a web page and gets all words, sorts and counts them.

    Web-Scrapping-1 An application that on a given url, crowls a web page and gets all words, sorts and counts them. Installation Using the package manage

    adriano atambo 1 Jan 16, 2022
    Basic-html-scraper - A complete how to of web scraping with Python for beginners

    basic-html-scraper Code from YT Video This video includes a complete how to of w

    John 12 Oct 22, 2022
    Nekopoi scraper using python3

    Features Scrap from url Todo [+] Search by genre [+] Search by query [+] Scrap from homepage Example # Hentai Scraper from nekopoi import Hent

    MhankBarBar 9 Apr 06, 2022
    Tool to scan for secret files on HTTP servers

    snallygaster Finds file leaks and other security problems on HTTP servers. what? snallygaster is a tool that looks for files accessible on web servers

    Hanno Böck 2k Dec 28, 2022
    Find papers by keywords and venues. Then download it automatically

    paper finder Find papers by keywords and venues. Then download it automatically. How to use this? Search CLI python search.py -k "knowledge tracing,kn

    Jiahao Chen (TabChen) 2 Dec 15, 2022
    Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

    Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

    Guilherme Silva Uchoa 3 Oct 04, 2022