A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

Last update: Dec 07, 2021

Related tags

Web Crawling uoj-spider

Overview

Universal Online Judge Spider

Introduction

This is a spider for Universal Online Judge (UOJ) system (https://uoj.ac/).

It also works for all other Online Judges using the UOJ system.

This spider is written in python3, using python selenium webdriver library and ChromeDriver.

It is only tested on Ubuntu 20.04, so the commands in the following section are only available for this system as well.

Features

Automatic login, no need to obtain cookies manually.
Convert pages into PDFs with reproducible text rather than simple screenshots.
Automatically detects the loading of MathJax to ensure that the mathematical formula within the results are displayed correctly.
Automatically skips pages that already exist (if the corresponding PDF file already exists locally).
Support for proxy.
Support for all websites using the UOJ system.

Installation

1. Install python3 and ChromeDriver:

apt install python3 python-pip3 chromium-browser chromium-chromedriver

2. Install selenium library for python3

pip3 install selenium

3. Download this program

Usage

Firstly you have to set these variables:

# [Basic settings]
url = ""
username = ""
password = ""
start_number = 1
end_number = 100
save_dir = "downloads"

# [Advanced settings]
proxy = ""
page_404_title = "404 - "
max_login_time = 60
max_mathjax_start_time = 60
max_mathjax_load_time = 60

Basic settings

url: the index URL of your target, e.g. https://uoj.ac/. Please note that the value must end in a slash /.
username: your username.
password: your password.
start_number: the number of the first problem crawled (minimum).
end_number: the number of the last problem crawled (maximum).
save_dir: the name of the folder where the result will be stored.

Advanced settings

If you don't know what the advanced settings are for, you're probably better not to change them.

proxy: the address of your proxy server, e.g. HTTP://127.0.0.1:1080, or SOCKS5://127.0.0.1:1081. Leave it blank (empty string) if you do not need to use a proxy.
page_404_title: the title of OJ's 404 page. You may use a substring of the title, like 404 - . If the program gets a page title that contains this string, the download of that page will be skipped.
max_login_time: the maximum waiting time for a login attempt, in seconds.
max_mathjax_start_time: the maximum wait time for a MathJax loading message to appear, in seconds.
max_mathjax_load_time: the maximum wait time for a MathJax loading message to disappear (i.e. MathJax rendering is finished), in seconds.

After completing the setup, run:

python3 main.py

Sample result

License

MIT License.

A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

Related tags

Overview

Universal Online Judge Spider

Introduction

Features

Installation

1. Install python3 and ChromeDriver:

2. Install selenium library for python3

3. Download this program

Usage

Basic settings

Advanced settings

Sample result

License

Owner

TriNitroTofu

Simple Web scrapper Bot to scrap webpages using Requests, html5lib and Beautifulsoup.

Simple tool to scrape and download cross country ski timings and results from live.skidor.com

中国大学生在线四史自动答题刷分(现仅支持英雄篇)

Web3 Pancakeswap Sniper bot written in python3

Twitter Claimer / Swapper / Turbo - Proxyless - Multithreading

This is a python api to scrape search results from a url.

Binance Smart Chain Contract Scraper + Contract Evaluator

A dead simple crawler to get books information from Douban.

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Web Scraping Practica With Python

淘宝茅台抢购最新优化版本，淘宝茅台秒杀，优化了茅台抢购线程队列

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Anonymously scrapes onlinesim.ru for new usable phone numbers.

Scrapping the data from each page of biocides listed on the BAUA website into a csv file

Amazon scraper using scrapy, a python framework for crawling websites.

IGLS - Instagram Like Scraper CLI tool

A crawler of doubamovie

A simple proxy scraper that utilizes the requests module in python.

The first public repository that provides free BUBT website scraping API script on Github.

Scrapy, a fast high-level web crawling & scraping framework for Python.

A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

Related tags

Overview

Universal Online Judge Spider

Introduction

Features

Installation

1. Install python3 and ChromeDriver:

2. Install selenium library for python3

3. Download this program

Usage

Basic settings

Advanced settings

Sample result

License

Owner

TriNitroTofu

Simple Web scrapper Bot to scrap webpages using Requests, html5lib and Beautifulsoup.

Simple tool to scrape and download cross country ski timings and results from live.skidor.com

中国大学生在线 四史自动答题刷分(现仅支持英雄篇)

Web3 Pancakeswap Sniper bot written in python3

Twitter Claimer / Swapper / Turbo - Proxyless - Multithreading

This is a python api to scrape search results from a url.

Binance Smart Chain Contract Scraper + Contract Evaluator

A dead simple crawler to get books information from Douban.

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Web Scraping Practica With Python

淘宝茅台抢购最新优化版本，淘宝茅台秒杀，优化了茅台抢购线程队列

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Anonymously scrapes onlinesim.ru for new usable phone numbers.

Scrapping the data from each page of biocides listed on the BAUA website into a csv file

Amazon scraper using scrapy, a python framework for crawling websites.

IGLS - Instagram Like Scraper CLI tool

A crawler of doubamovie

A simple proxy scraper that utilizes the requests module in python.

The first public repository that provides free BUBT website scraping API script on Github.

Scrapy, a fast high-level web crawling & scraping framework for Python.

中国大学生在线四史自动答题刷分(现仅支持英雄篇)