This program scrapes information and images for movies and TV shows.

Overview

Media-WebScraper

This program scrapes information and images for movies and TV shows.

Summary

For more information on the program, read the WebScrape_help text file (this can also be accessed while running the program).

For a given list of media, the program will scrape and save general information, images and any episode information for each media.

General Information (default):

Saved as a .txt file

This will scrape general information:

  • Title
  • Release date
  • Runtime
  • Genre
  • Director
  • Cast
  • Plot description

Additional information saved:

  • Source database used for scrape
  • ID for media in source database
  • Poster image link

Images (default):

Saved as a .jpg file

This will scrape the poster.

Episode Information (if specified):

Saved as a .csv file

This will scrape information for each episode for a TV show:

  • Season number
  • Episode number
  • Episode title
  • Episode air date
  • Episode description

Features:

  • Multithreaded scraping for media in list to greatly improve the time taken when scraping for large media lists.
  • Can generate a media list from folders and files in a specified directory or from user input.
  • Can specify save location for scraped data.
  • Can specify search tags for media list for a more accurate scrape.
  • Can choose to scrape all episode information for a TV show.
  • Can detect if data is already scraped which allows for scraping new media from an already scraped list of media very efficient.
  • Can recover missing scraped files if one or more are missing without rescraping all data.
  • Can retry the scrape before exiting the program if there were any incomplete scrapes (successfully scraped files will not be altered or rescraped).
  • Currently only supports scraping data from IMDb.

Usage:

For more information on the program, read the WebScrape_help text file (this can also be accessed while running the program).

Currently a terminal-based program.

Running the program using python:

  • Requirements: Python 3.2+ (additional libraries: requests, beautifulsoup4)

Running the program from bundled executable file (created using pyinstaller):

  • Requirements: Windows 10
  • Creates a 'temp' folder containing extracted libraries and support files in the same location as the program while running.
    • The temporary files will delete automatically but if the program is closed abruptly, the files will remain.
    • The 'temp' folder can be manually deleted after closing the program.
    • (As of pyinstaller v4.7, a one-file bundled executable will leave any temp '_MEIxxxxxx' folders if the program is force closed)

Updates:

For information on version history, read the HISTORY markdown file.

You might also like...
Scrapes proxies and saves them to a text file

Proxy Scraper Scrapes proxies from https://proxyscrape.com and saves them to a file. Also has a customizable theme system Made by nell and Lamp

Meme-videos - Scrapes memes and turn them into a video compilations

Meme Videos Scrapes memes from reddit using praw and request and then converts t

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file

WebScraping Web scraping Pyton program that scrapes Job website for python devel

:arrow_double_down: Dumb downloader that scrapes the web
:arrow_double_down: Dumb downloader that scrapes the web

You-Get NOTICE: Read this if you are looking for the conventional "Issues" tab. You-Get is a tiny command-line utility to download media contents (vid

Anonymously scrapes onlinesim.ru for new usable phone numbers.

phone-scraper Anonymously scrapes onlinesim.ru for new usable phone numbers. Usage Clone the repository $ git clone https://github.com/thomasgruebl/ph

A Python package that scrapes Google News article data while remaining undetected by Google.

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)

Scrapes Every Email Address of Every Society in Every University
Scrapes Every Email Address of Every Society in Every University

society-email-scrape Site Live at https://kcsoc.github.io/society-email-scrape/ How to automatically generate new data Go to unis.yml Add your uni Cre

Automatically scrapes all menu items from the Taco Bell website

Automatically scrapes all menu items from the Taco Bell website. Returns as PANDAS dataframe.

Releases(v1.3.0)
  • v1.3.0(Dec 5, 2021)

    WebScrape v1.3.0

    See version history document for all changes.

    Running the program using python:

    Download the source code.

    Requirements:

    Python 3.2+ (additional libraries: requests, beautifulsoup4)

    Running the program from bundled executable:

    Download the WebScrape-1.3.0 zip file containing the bundled executable (created using pyinstaller).

    Requirements:

    Windows 10

    Note:

    The executable file creates a 'temp' folder containing extracted libraries and support files in the same location as the program while running.

    • The temporary files will delete automatically but if the program is closed abruptly, the files will remain.
    • The 'temp' folder can be manually deleted after closing the program.
    • (As of pyinstaller v4.7, a one-file bundled executable will leave any temp '_MEIxxxxxx' folders if the program is force closed)
    Source code(tar.gz)
    Source code(zip)
    WebScrape-1.3.0.zip(8.71 MB)
A low-code tool that generates python crawler code based on curl or url

KKBA Intruoduction A low-code tool that generates python crawler code based on curl or url Requirement Python = 3.6 Install pip install kkba Usage Co

8 Sep 20, 2021
优化版本的京东茅台抢购神器

优化版本的京东茅台抢购神器

1.8k Mar 18, 2022
Scrapy, a fast high-level web crawling & scraping framework for Python.

Scrapy Overview Scrapy is a fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pag

Scrapy project 45.5k Jan 07, 2023
A web scraper that exports your entire WhatsApp chat history.

WhatSoup 🍲 A web scraper that exports your entire WhatsApp chat history. Table of Contents Overview Demo Prerequisites Instructions Frequen

Eddy Harrington 87 Jan 06, 2023
Scrape plants scientific name information from Agroforestry Species Switchboard 2.0.

Agroforestry Species Switchboard 2.0 Scraper Scrape plants scientific name information from Species Switchboard 2.0. Requirements python = 3.10 (you

Mgs. M. Rizqi Fadhlurrahman 2 Dec 23, 2021
Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.

Web Scrapping Popular Youtube Tech Channels with Selenium Data Mining, Data Wrangling, and Exploratory Data Analysis About the Data Web scrapi

David Rusho 0 Aug 18, 2021
Examine.com supplement research scraper!

ExamineScraper Examine.com supplement research scraper! Why I want to be able to search pages for a specific term. For example, I want to be able to s

Tyler 15 Dec 06, 2022
Scraping web pages to get data

Scraping Data Get public data and save in database This is project use Python How to run a project 1 - Clone the repository 2 - Install beautifulsoup4

Soccer Project 2 Nov 01, 2021
Telegram Group Scrapper

this programe is make your work so much easy on telegrame. do you want to send messages on everyone to your group or others group. use this script it will do your work automatically with one click. a

HackArrOw 3 Dec 03, 2022
A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

Parallel web scraping The project is a training task for web scraping using python multithreading and a real-time-updated list of available proxy serv

Kushal Shingote 1 Feb 10, 2022
Python web scrapper

Website scrapper Web scrapping project in Python. Created for learning purposes. Start Install python Update configuration with websites Launch script

Nogueira Vitor 1 Dec 19, 2021
Web-Scrapper using Python and Flask

Web-Scrapper "[초급]Python으로 웹 스크래퍼 만들기" 코스 -NomadCoders 기초적인 Python 문법강의부터 시작하여 웹사이트의 html파일에서 원하는 내용을 Scrapping해서 출력, csv 파일로 저장, flask를 이용한 간단한 웹페이지

윤성도 1 Nov 10, 2021
A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

🕳️ CygnusX1 Code by Trong-Dat Ngo. Overviews 🕳️ CygnusX1 is a multithreaded tool 🛠️ , used to search and download images from popular search engine

DatNgo 32 Dec 31, 2022
A command-line program to download media, like and unlike posts, and more from creators on OnlyFans.

onlyfans-scraper A command-line program to download media, like and unlike posts, and more from creators on OnlyFans. Installation You can install thi

185 Jul 23, 2022
Get-web-images - A python code that get images from any site

image retrieval This is a python code to retrieve an image from the internet, a

CODE 1 Dec 30, 2021
Simple tool to scrape and download cross country ski timings and results from live.skidor.com

LiveSkidorDownload Simple tool to scrape and download cross country ski timings and results from live.skidor.com Usage: Put the python file in a dedic

0 Jan 07, 2022
Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

trafilatura: Web scraping tool for text discovery and retrieval Description Trafilatura is a Python package and command-line tool which seamlessly dow

Adrien Barbaresi 704 Jan 06, 2023
OSTA web scraper, for checking the status of school buses in Ottawa

OSTA-La-Vista OSTA web scraper, for checking the status of school buses in Ottawa. Getting Started Using a Raspberry Pi, download Python 3, and option

1 Jan 28, 2022
Searching info from Google using Python Scrapy

Python-Search-Engine-Scrapy || Python-爬虫-索引/利用爬虫获取谷歌信息**/ Searching info from Google using Python Scrapy /* 利用 PYTHON 爬虫获取天气信息,以及城市信息和资料**/ translatio

HONGVVENG 1 Jan 06, 2022
Web Content Retrieval for Humans™

Lassie Lassie is a Python library for retrieving basic content from websites. Usage import lassie lassie.fetch('http://www.youtube.com/watch?v

Mike Helmick 570 Dec 19, 2022