A web crawler for recording posts in "sina weibo"

Overview

Web Crawler for "sina weibo"

A web crawler for recording posts in "sina weibo"

Introduction

This script helps collect attributes of posts in "sina weibo". Users can record posts in different lists (or flows, or collections), like the searching results. The supported lists (or flows, or collections) are listed in "Functions" section.

Functions

Scripts currently available:

Name Description
search.py Search for a word and specific time interval and record all posts, the search result.
Parameters: (Edit these parameters at the head of the script.)
search_string: The string to search for. All posts containing this string will be recorded, 50 pages at most.
start_time: Only posts which are posted after this time will be recorded. (Accurate to hour level)
end_time: Only posts which are posted before this time will be recorded. (Accurate to hour level)
rest_time: The interval between two requests, where the unit is second.
Results are saved as Python pickle format at results/weibo-{search_string}-{start_time}-{end_time}.pkl. The start_time and end_time in filename are formatted as Unix timestamp (the unit is second).

Installation

  1. Run pip install -r requirements.txt.
  2. According to "Function" section, find the script you need.
  3. Edit parameters at the head of the script.
  4. Run the script with Python.
Script used to download data for stocks.

This script is useful for downloading stock market data for a wide range of companies specified by their respective tickers. The script reads in the d

Carmelo Gonzales 71 Oct 04, 2022
Twitter Claimer / Swapper / Turbo - Proxyless - Multithreading

Twitter Turbo / Auto Claimer / Swapper Version: 1.0 Last Update: 01/26/2022 Use this at your own descretion. I've only used this on test accounts and

Underscores 6 May 02, 2022
爱奇艺会员,腾讯视频,哔哩哔哩,百度,各类签到

My-Actions 个人收集并适配Github Actions的各类签到大杂烩 不要fork了 ⭐️ star就行 使用方式 新建仓库并同步代码 点击Settings - Secrets - 点击绿色按钮 (如无绿色按钮说明已激活。直接到下一步。) 新增 new secret 并设置 Secr

280 Dec 30, 2022
Scrapes all articles and their headlines from theonion.com

The Onion Article Scraper Scrapes all articles and their headlines from the satirical news website https://www.theonion.com Also see Clickhole Article

0 Nov 17, 2021
Nekopoi scraper using python3

Features Scrap from url Todo [+] Search by genre [+] Search by query [+] Scrap from homepage Example # Hentai Scraper from nekopoi import Hent

MhankBarBar 9 Apr 06, 2022
Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website (prior form publication) by form number and returns the results as json. It provides the option to download pdfs over a ra

1 Jan 04, 2022
This is a script that scrapes the longitude and latitude on food.grab.com

grab This is a script that scrapes the longitude and latitude for any restaurant in Manila on food.grab.com, location can be adjusted. Search Result p

0 Nov 22, 2021
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Parsel Parsel is a BSD-licensed Python library to extract and remove data from HTML and XML using XPath and CSS selectors, optionally combined with re

Scrapy project 859 Dec 29, 2022
Console application for downloading images from Reddit in Python

RedditImageScraper Console application for downloading images from Reddit in Python Introduction This short Python script was created for the mass-dow

James 0 Jul 04, 2021
Iptvcrawl - A scrapy project for crawl IPTV playlist

iptvcrawl a scrapy project for crawl IPTV playlist. Dependency Python3 pip insta

Zhijun 18 May 05, 2022
👨🏼‍⚖️ reddit bot that turns comment chains into ace attorney scenes

Ace Attorney reddit bot 👨🏼‍⚖️ Reddit bot that turns comment chains into ace attorney scenes. You'll need to sign up for streamable and reddit and se

763 Nov 17, 2022
A distributed crawler for weibo, building with celery and requests.

A distributed crawler for weibo, building with celery and requests.

SpiderClub 4.8k Jan 03, 2023
用python爬取江苏几大高校的就业网站,并提供3种方式通知给用户,分别是通过微信发送、命令行直接输出、windows气泡通知。

crawler_for_university 用python爬取江苏几大高校的就业网站,并提供3种方式通知给用户,分别是通过微信发送、命令行直接输出、windows气泡通知。 环境依赖 wxpy,requests,bs4等库 功能描述 该项目基于python,通过爬虫爬各高校的就业信息网,爬取招聘信

8 Aug 16, 2021
This script is intended to crawl license information of repositories through the GitHub API.

GithubLicenseCrawler This script is intended to crawl license information of repositories through the GitHub API. Taking a csv file with requirements.

schutera 4 Oct 25, 2022
一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件

QQ音乐歌词爬虫 一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件,默认去除了所有演唱会(Live)版本的歌曲。 使用方法 直接运行python run.py即可,然后输入你想获取的歌手名字,然后静静等待片刻。 output目录下保存生成的歌词和歌名文件。以周杰伦为例,会生成两

Yang Wei 11 Jul 27, 2022
Dex-scrapper - Hobby project for scrapping dex data on VeChain

Folders /zumo_abis # abi extracted from zumo repo /zumo_pools # runtime e

3 Jan 20, 2022
A modern CSS selector implementation for BeautifulSoup

Soup Sieve Overview Soup Sieve is a CSS selector library designed to be used with Beautiful Soup 4. It aims to provide selecting, matching, and filter

Isaac Muse 151 Dec 23, 2022
mlscraper: Scrape data from HTML pages automatically with Machine Learning

🤖 Scrape data from HTML websites automatically with Machine Learning

Karl Lorey 798 Dec 29, 2022
Scraping script for stats on covid19 pandemic status in Chiba prefecture, Japan

About 千葉県の地域別の詳細感染者統計(Excelファイル) をCSVに変換し、かつ地域別の日時感染者集計値を出力するスクリプトです。 Requirement POSIX互換なシェル, e.g. GNU Bash (1) curl (1) python = 3.8 pandas = 1.1.

Conv4Japan 1 Nov 29, 2021
SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

SearchifyX SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features. SearchifyX lets you

28 Dec 20, 2022