A web crawler for recording posts in "sina weibo"

Last update: Aug 20, 2022

Overview

Web Crawler for "sina weibo"

A web crawler for recording posts in "sina weibo"

Introduction

This script helps collect attributes of posts in "sina weibo". Users can record posts in different lists (or flows, or collections), like the searching results. The supported lists (or flows, or collections) are listed in "Functions" section.

Functions

Scripts currently available:

Name Description

search.py Search for a word and specific time interval and record all posts, the search result.
Parameters: (Edit these parameters at the head of the script.)
search_string: The string to search for. All posts containing this string will be recorded, 50 pages at most.
start_time: Only posts which are posted after this time will be recorded. (Accurate to hour level)
end_time: Only posts which are posted before this time will be recorded. (Accurate to hour level)
rest_time: The interval between two requests, where the unit is second.
Results are saved as Python pickle format at results/weibo-{search_string}-{start_time}-{end_time}.pkl. The start_time and end_time in filename are formatted as Unix timestamp (the unit is second).

Name	Description
`search.py`	Search for a word and specific time interval and record all posts, the search result. Parameters: (Edit these parameters at the head of the script.) `search_string`: The string to search for. All posts containing this string will be recorded, 50 pages at most. `start_time`: Only posts which are posted after this time will be recorded. (Accurate to hour level) `end_time`: Only posts which are posted before this time will be recorded. (Accurate to hour level) `rest_time`: The interval between two requests, where the unit is second. Results are saved as Python pickle format at `results/weibo-{search_string}-{start_time}-{end_time}.pkl`. The `start_time` and `end_time` in filename are formatted as Unix timestamp (the unit is second).

Installation

Run pip install -r requirements.txt.
According to "Function" section, find the script you need.
Edit parameters at the head of the script.
Run the script with Python.

A web crawler for recording posts in "sina weibo"

Related tags

Overview

Web Crawler for "sina weibo"

Introduction

Functions

Installation

Owner

Script used to download data for stocks.

Twitter Claimer / Swapper / Turbo - Proxyless - Multithreading

爱奇艺会员,腾讯视频,哔哩哔哩,百度,各类签到

Scrapes all articles and their headlines from theonion.com

Nekopoi scraper using python3

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

This is a script that scrapes the longitude and latitude on food.grab.com

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Console application for downloading images from Reddit in Python

Iptvcrawl - A scrapy project for crawl IPTV playlist

👨🏼‍⚖️ reddit bot that turns comment chains into ace attorney scenes

A distributed crawler for weibo, building with celery and requests.

用python爬取江苏几大高校的就业网站，并提供3种方式通知给用户，分别是通过微信发送、命令行直接输出、windows气泡通知。

This script is intended to crawl license information of repositories through the GitHub API.

一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件

Dex-scrapper - Hobby project for scrapping dex data on VeChain

A modern CSS selector implementation for BeautifulSoup

mlscraper: Scrape data from HTML pages automatically with Machine Learning

Scraping script for stats on covid19 pandemic status in Chiba prefecture, Japan

SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.