A web scraper for nomadlist.com, made to avoid website restrictions.

Related tags

Web Crawlinggypsylist
Overview

Gypsylist

gypsylist.py is a web scraper for nomadlist.com, made to avoid website restrictions.

nomadlist.com is a website with a lot of information for digital nomad people, to find the best places to live and work remotely as a location independent remote worker. Unfortunately most of these contents are restricted if you are not member of this website.

This script doesn't cover all of the information retrievable from the website, but it's just an entry point to evaluate this without to sign up.

Installation

Before to use gypsylist you have to install some requirements:

pip3 install -r requirements.txt

Additionally, having selenium as dependency, you have also to setup the browser driver. To install this, please, take a look here: https://www.selenium.dev/documentation/webdriver/getting_started/install_drivers/.

Now you should be ready to run the script.

Usage

To use gypsylist, at first, browse the nomadlist.com website and apply the filters you need to do your research. Now, get the url path from the address bar of your browser (as shown below):

And use this to scrape with gypsylist:

./gypsylist.py --path "safe-places-for-remote-workers-to-live?sort=cost_for_nomad_in_usd&order=asc" --emoji

This is going to be the expected result:

#1
🏙️  city: Lisbon
🌎 country: Portugal
⭐️ overall: 4/5
💵 cost: 4/5
📡 internet: 5/5
😀 fun: 5/5
👮 safety: 4/5

...

#440
🏙️  city: Zurich
🌎 country: Switzerland
⭐️ overall: 3/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 4/5
👮 safety: 4/5

#441
🏙️  city: Leiden
🌎 country: Netherlands
⭐️ overall: 3/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 4/5
👮 safety: 4/5

#442
🏙️  city: Honolulu, Hawaii
🌎 country: United States
⭐️ overall: 4/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 5/5
👮 safety: 4/5

#443
🏙️  city: Lake Tahoe, CA
🌎 country: United States
⭐️ overall: 3/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 4/5
👮 safety: 4/5

(Always remember --emoji). Have fun!

Known Issues

This is not what you can call "a well written code" (sorry Gods of programming for this). For this reason there are several code smell or bugs that are not under review (due to the short time I dedicated to write the script).

  • Using --headless / -H parameter to set the browser in headless mode, you will retrieve just the first page contents from the website.
Owner
Alessio Greggi
Computer Scientist graduated at the University of Rome, Tor Vergata. Currently working as Linux Engineer. CTF Player during free time.
Alessio Greggi
让中国用户使用git从github下载的速度提高1000倍!

序言 github上有很多好项目,但是国内用户连github却非常的慢.每次都要用插件或者其他工具来解决. 这次自己做一个小工具,输入github原地址后,就可以自动替换为代理地址,方便大家更快速的下载. 安装 pip install cit 主要功能与用法 主要功能 change 将目标地址转换为

35 Aug 29, 2022
Async Python 3.6+ web scraping micro-framework based on asyncio

Ruia 🕸️ Async Python 3.6+ web scraping micro-framework based on asyncio. ⚡ Write less, run faster. Overview Ruia is an async web scraping micro-frame

howie.hu 1.6k Jan 01, 2023
A command-line program to download media, like and unlike posts, and more from creators on OnlyFans.

onlyfans-scraper A command-line program to download media, like and unlike posts, and more from creators on OnlyFans. Installation You can install thi

185 Jul 23, 2022
Footballmapies - Football mapies for learning webscraping and use of gmplot module in python

Footballmapies - Football mapies for learning webscraping and use of gmplot module in python

1 Jan 28, 2022
Demonstration on how to use async python to control multiple playwright browsers for web-scraping

Playwright Browser Pool This example illustrates how it's possible to use a pool of browsers to retrieve page urls in a single asynchronous process. i

Bernardas Ališauskas 8 Oct 27, 2022
Google Maps crawler using Selenium

Google Maps Crawler using Selenium Built as part of the Antifragile Dev Project Selenium crawler that browses Google Maps as a regular user and stores

Guilherme Latrova 46 Dec 16, 2022
一些爬虫相关的签名、验证码破解

cracking4crawling 一些爬虫相关的签名、验证码破解,目前已有脚本: 小红书App接口签名(shield)(2020.12.02) 小红书滑块(数美)验证破解(2020.12.02) 海南航空App接口签名(hnairSign)(2020.12.05) 说明: 脚本按目标网站、App命

XNFA 90 Feb 09, 2021
Crawler job that scrapes comments from social media posts and saves them in a S3 bucket.

Toxicity comments crawler Crawler job that scrapes comments from social media posts and saves them in a S3 bucket. Twitter Tweets and replies are scra

Douglas Trajano 2 Jan 24, 2022
A Python package that scrapes Google News article data while remaining undetected by Google.

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https

Geminid Systems, Inc 6 Aug 10, 2022
A web scraper which checks price of a product regularly and sends price alerts by email if price reduces.

Amazon-Web-Scarper Created a web scraper using simple functions to check price of a product on amazon (can be duplicated to check price at other marke

Swaroop Todankar 1 Jan 17, 2022
Create crawler get some new products with maximum discount in banimode website

crawler-banimode create crawler and get some new products with maximum discount in banimode website. این پروژه کوچک جهت یادگیری و کار با ابزار سلنیوم

nourollah rezaei 2 Feb 17, 2022
优化版本的京东茅台抢购神器

优化版本的京东茅台抢购神器

1.8k Mar 18, 2022
A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

Yun Wang 1 Jan 10, 2022
Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

crawlersuseragents This Python script can be used to check if there is any differences in responses of an application when the request comes from a se

Podalirius 13 Dec 27, 2022
Python Web Scrapper Project

Web Scrapper Projeto desenvolvido em python, sobre tudo com Selenium, BeautifulSoup e Pandas é um web scrapper que puxa uma tabela com as principais e

Jordan Ítalo Amaral 2 Jan 04, 2022
This repo has the source code for the crawler and data crawled from auto-data.net

This repo contains the source code for crawler and crawled data of cars specifications from autodata. The data has roughly 45k cars

Tô Đức Anh 5 Nov 22, 2022
A simple app to scrap data from Twitter.

Twitter-Scraping-App A simple app to scrap data from Twitter. Available Features Search query. Select number of data you want to fetch from twitter. C

Davis David 2 Oct 31, 2022
Simple library for exploring/scraping the web or testing a website you’re developing

Robox is a simple library with a clean interface for exploring/scraping the web or testing a website you’re developing. Robox can fetch a page, click on links and buttons, and fill out and submit for

Dan Claudiu Pop 79 Nov 27, 2022
Telegram Group Scrapper

this programe is make your work so much easy on telegrame. do you want to send messages on everyone to your group or others group. use this script it will do your work automatically with one click. a

HackArrOw 3 Dec 03, 2022
Goblyn is a Python tool focused to enumeration and capture of website files metadata.

Goblyn Metadata Enumeration What's Goblyn? Goblyn is a tool focused to enumeration and capture of website files metadata. How it works? Goblyn will se

Gustavo 46 Nov 22, 2022