Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

Overview

Shopee Scraper

A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil.

The project was created in python 3 and requires only 3 libraries that may need to be installed (in case you don't have any of them).

They are: requests, date and time. Date and Time are default libraries for Linux and Mac users, but if you're running Windows, make sure to install them using pip.

You can easily install requests using the following command: $ pip install requests

The script runs based on Shopee's public API. Shopee generates a dynamic page that shows products and its information calling a json file. Since it's an API and it's public, it's easier to just call the json file and extract the data instead of selecting divs, classes and scrolling through the results and using Selenium to simulate a web browser.

How to use it

  1. The first thing you have to do is to find the seller's id. It's present in the product link.

Exemple: https://shopee.com.br/Camisetas-Bandas-Rock-RHCP-Red-Hot-Chili-Peppers-100-Algodao!!-i.409068735.3983196792

  • 409068735 is the seller's id. That's required to run the script.
  • 3983196792 is the product's id
  1. Before running the code, change the file directory where you want to save the csv file generated what will contain all the data extracted.
  • file=open("/YOUR-DIRECTORY/%s-YOUR-FILE-NAME.csv" % data, "a"))
  • The %s- right before the file name prints the date when the csv was generated. It's recommended to keep it that way, in order to track down your files.
  1. Using the terminal, go to the script's folder and run:
  • python3 shopee-scraper.py
  • Type in the seller's id you just got from the product link.
  • The script will scrape 999 products published and the scraper will take 1 sec. per ad. So it may take some time depending on the number of products.

Why I created this project and who I am?

  • I'm a Computer Engineering and Mathematics major in Brazil. I already got a bachelors degree in Marketing and I'm looking for a Data Engineer and Data Scientist position.
  • Currently working for a small company in Brazil as a comercial manager and my main role is to increse the online sales of hydraulic and brass connectors for gas and petroleum
  • I love data and statistics. Finding new possibilities and ways of doing things better and faster through the data is a facinating thing, and quoting Carl Sagan I would say that "it's a pleasure to share a planet and an epoch with you", because the humankind don't even know yet what we're capable of. AI and machine learning will show us a new world, a new age.
  • I really like the feeling of helping companies to make better data-driven decisions on online sales, marketing and purchasing. Solving problems is pretty much the main motivation of any mathematician or engineer
Owner
Paulo DaRosa
Computer Engineer, Mathematician and Marketer.
Paulo DaRosa
fork huanghyw/jd_seckill

Jd_Seckill 特别声明: 本仓库发布的jd_seckill项目中涉及的任何脚本,仅用于测试和学习研究,禁止用于商业用途,不能保证其合法性,准确性,完整性和有效性,请根据情况自行判断。 本项目内所有资源文件,禁止任何公众号、自媒体进行任何形式的转载、发布。

512 Jan 03, 2023
Danbooru scraper with python

Danbooru Version: 0.0.1 License under: MIT License Dependencies Python: = 3.9.7 beautifulsoup4 cloudscraper Example of use Danbooru from danbooru imp

Sugarbell 2 Oct 27, 2022
A simple django-rest-framework api using web scraping

Apicell You can use this api to search in google, bing, pypi and subscene and get results Method : POST Parameter : query Example import request url =

Hesam N 1 Dec 19, 2021
Web Scraping Practica With Python

Web-Scraping-Practica Integrants: Guillem Vidal Pallarols. Lídia Bandrés Solé Fitxers: Aquest document és el primer que trobem. A continuació trobem u

2 Nov 08, 2021
A Telegram crawler to search groups and channels automatically and collect any type of data from them.

Introduction This is a crawler I wrote in Python using the APIs of Telethon months ago. This tool was not intended to be publicly available for a numb

39 Dec 28, 2022
A webdriver-based script for reserving Tsinghua badminton courts.

AutoReserve A webdriver-based script for reserving badminton courts. 使用说明 下载 chromedriver 选择当前Chrome对应版本 安装 selenium pip install selenium 更改场次、金额信息dat

Payne Zhang 4 Nov 09, 2021
Minimal set of tools to conduct stealthy scraping.

Stealthy Scraping Tools Do not use puppeteer and playwright for scraping. Explanation. We only use the CDP to obtain the page source and to get the ab

Nikolai Tschacher 88 Jan 04, 2023
Displays market info for the LUNI token on the Terra Blockchain

LuniBot for Discord Displays market info for the LUNI/LUNA token on the Terra Blockchain (Webscrape method currently scraping CoinMarketCap). Will evo

0 Jan 22, 2022
Grab the changelog from releases on Github

release-notes-scraper This simple script can be used to grab the release notes for projects from github that do not keep a CHANGELOG, but publish thei

Dan Čermák 4 Apr 01, 2022
Pseudo API for Google Trends

pytrends Introduction Unofficial API for Google Trends Allows simple interface for automating downloading of reports from Google Trends. Only good unt

General Mills 2.6k Dec 28, 2022
爱奇艺会员,腾讯视频,哔哩哔哩,百度,各类签到

My-Actions 个人收集并适配Github Actions的各类签到大杂烩 不要fork了 ⭐️ star就行 使用方式 新建仓库并同步代码 点击Settings - Secrets - 点击绿色按钮 (如无绿色按钮说明已激活。直接到下一步。) 新增 new secret 并设置 Secr

280 Dec 30, 2022
Find thumbnails and original images from URL or HTML file.

Haul Find thumbnails and original images from URL or HTML file. Demo Hauler on Heroku Installation on Ubuntu $ sudo apt-get install build-essential py

Vinta Chen 150 Oct 15, 2022
A Python package that scrapes Google News article data while remaining undetected by Google.

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https

Geminid Systems, Inc 6 Aug 10, 2022
This is a sport analytics project that combines the knowledge of OOP and Webscraping

This is a sport analytics project that combines the knowledge of Object Oriented Programming (OOP) and Webscraping, the weekly scraping of the English Premier league table is carried out to assess th

Dolamu Oludare 1 Nov 26, 2021
中国大学生在线 四史自动答题刷分(现仅支持英雄篇)

中国大学生在线 “四史”学习教育竞答 自动答题 刷分 (现仅支持英雄篇,已更新可用) 若对您有所帮助,记得点个Star 🌟 !!! 中国大学生在线 “四史”学习教育竞答 自动答题 刷分 (现仅支持英雄篇,已更新可用) 🥰 🥰 🥰 依赖 本项目依赖的第三方库: requests 在终端执行以下

XWhite 229 Dec 12, 2022
Instagram profile scrapper with python

IG Profile Scrapper Instagram profile Scrapper Just type the username, and boo! :D Instalation clone this repo to your computer git clone https://gith

its Galih 6 Nov 07, 2022
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Gerapy Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Documentation Documentation

Gerapy 2.9k Jan 03, 2023
a way to scrape a database of all of the isef projects

ISEF Database This is a simple web scraper which gets all of the projects and abstract information from here. My goal for this is for someone to get i

William Kaiser 1 Mar 18, 2022
Scrap-mtg-top-8 - A top 8 mtg scraper using python

Scrap-mtg-top-8 - A top 8 mtg scraper using python

1 Jan 24, 2022
A high-level distributed crawling framework.

Cola: high-level distributed crawling framework Overview Cola is a high-level distributed crawling framework, used to crawl pages and extract structur

Xuye (Chris) Qin 1.5k Jan 04, 2023