A repository with scraping code and soccer dataset from understat.com.

Last update: Jan 03, 2023

Related tags

Overview

UNDERSTAT - SHOTS DATASET

As many people interested in soccer analytics know, Understat is an amazing source of information. They provide Expected Goals (xG) stats for every shot taken in the top 5 leagues in Europe, as well as the Russian league.

After watching an awesome tutorial by McKay Johns (great channel btw, loads of resources for beginners in soccer analytics), I decided to write some code to scrape all the shots data available at Understat. As a consequence I managed to generate this dataset, containing shots data of season 2014/2015, up to every match played in the 2020/2021 season, for the top division on the following countries:

England - EPL

Spain - La Liga

Germany - Bundesliga

Italy - Serie A

France - Ligue 1

Russia - RFPL

Besides shots data, I also managed to scrape very detailed season stats on every single player that took part in these matches.

The datasets have been split into folders for every league, so every folder has 7 .csv files for shots data and 7 .csv files for players data (1 for every season since 14/15). The full dataset, with every league and season combined is also available at the "datasets" folder. I plan on updating the datasets everyday, but I also uploaded the Python code that generates and updates the datasets. Feel free to play with it and suggest improvements (hit me up on twitter). To update it by yourself, just save "scraping" and "datasets" on the same folder, run Python with this folder as the current working directory and then run the update.py script, that is located in "scraping".

Most of the columns in the datasets are pretty straightforward, but some aren't. So I uploaded a couple of .pdf files in "documentation", explaining every column.

A repository with scraping code and soccer dataset from understat.com.

Related tags

Overview

UNDERSTAT - SHOTS DATASET

Owner

douglasbc

A Python library for automating interaction with websites.

Web Content Retrieval for Humans™

Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine .

Parse feeds in Python

Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

Examine.com supplement research scraper!

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

A high-level distributed crawling framework.

Scrapy-based cyber security news finder

Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

Find papers by keywords and venues. Then download it automatically

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

simple http & https proxy scraper and checker

A simple reddit scraper to get memes (only images) from r/ProgrammerHumor.

A low-code tool that generates python crawler code based on curl or url

抢京东茅台脚本，定时自动触发，自动预约，自动停止

Script used to download data for stocks.

用python爬取江苏几大高校的就业网站，并提供3种方式通知给用户，分别是通过微信发送、命令行直接输出、windows气泡通知。

A simple app to scrap data from Twitter.

Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs.