A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Last update: Mar 28, 2022

Overview

New to Streaming Scraper

An in-progress web scraping project built with Python, R, and SQL.

The scraped data are movie and TV show information. The goal of the project is to show new to streaming titles that arrive on Netflix monthly with additional details, such as critic and audience ratings.

Current stage: Preparing how to present data with R Markdown.

Testing at: https://charlesdungy.github.io/new-to-streaming-scraper/

Future stage: Complete documentation, comments.

Description

Data are retrieved from two different data sources: What's on Netflix (WON) and Rotten Tomatoes (RT). RT data are cleaned and transformed with Python, while WON data are cleaned and transformed with R.

All data are piped into a MySQL database, then retrieved for presentation in R.

Here is a high-level look at the pipeline:

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

MySQL

Current Directory Tree

License

MIT

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Related tags

Overview

New to Streaming Scraper

Description

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

Current Directory Tree

License

Owner

Charles Dungy

A low-code tool that generates python crawler code based on curl or url

An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!

An Web Scraping API for MDL(My Drama List) for Python.

Scrapy, a fast high-level web crawling & scraping framework for Python.

Dictionary - Application focused on word search through web scraping

Web-Scraping using Selenium Master

Library to scrape and clean web pages to create massive datasets.

Introduction to WebScraping Workshop - Semcomp 24 Beta

Unja is a fast & light tool for fetching known URLs from Wayback Machine

Scrapes mcc-mnc.com and outputs 3 files with the data (JSON, CSV & XLSX)

An IpVanish Proxies Scraper

SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

Command line program to download documents from web portals.

PS5 bot to find a console in france for chrismas 🎄🎅🏻 NOT FOR SCALPERS

Web scrapping

Python script for crawling ResearchGate.net papers✨⭐️📎

Binance harvester - A Python 3 script to harvest data from the Binance socket stream and calculate popular TA indicators and produce lists of top trending coins

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

京东茅台抢购