Gypsylist

gypsylist.py is a web scraper for nomadlist.com, made to avoid website restrictions.

nomadlist.com is a website with a lot of information for digital nomad people, to find the best places to live and work remotely as a location independent remote worker. Unfortunately most of these contents are restricted if you are not member of this website.

This script doesn't cover all of the information retrievable from the website, but it's just an entry point to evaluate this without to sign up.

Installation

Before to use gypsylist you have to install some requirements:

pip3 install -r requirements.txt

Additionally, having selenium as dependency, you have also to setup the browser driver. To install this, please, take a look here: https://www.selenium.dev/documentation/webdriver/getting_started/install_drivers/.

Now you should be ready to run the script.

Usage

To use gypsylist, at first, browse the nomadlist.com website and apply the filters you need to do your research. Now, get the url path from the address bar of your browser (as shown below):

And use this to scrape with gypsylist:

./gypsylist.py --path "safe-places-for-remote-workers-to-live?sort=cost_for_nomad_in_usd&order=asc" --emoji

This is going to be the expected result:

#1
🏙️  city: Lisbon
🌎 country: Portugal
⭐️ overall: 4/5
💵 cost: 4/5
📡 internet: 5/5
😀 fun: 5/5
👮 safety: 4/5

...

#440
🏙️  city: Zurich
🌎 country: Switzerland
⭐️ overall: 3/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 4/5
👮 safety: 4/5

#441
🏙️  city: Leiden
🌎 country: Netherlands
⭐️ overall: 3/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 4/5
👮 safety: 4/5

#442
🏙️  city: Honolulu, Hawaii
🌎 country: United States
⭐️ overall: 4/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 5/5
👮 safety: 4/5

#443
🏙️  city: Lake Tahoe, CA
🌎 country: United States
⭐️ overall: 3/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 4/5
👮 safety: 4/5

(Always remember --emoji). Have fun!

Known Issues

This is not what you can call "a well written code" (sorry Gods of programming for this). For this reason there are several code smell or bugs that are not under review (due to the short time I dedicated to write the script).

Using --headless / -H parameter to set the browser in headless mode, you will retrieve just the first page contents from the website.

A web scraper for nomadlist.com, made to avoid website restrictions.

Related tags

Overview

Gypsylist

Installation

Usage

Known Issues

Owner

Alessio Greggi

A modern CSS selector implementation for BeautifulSoup

Web Scraping Practica With Python

A social networking service scraper in Python

Scrap-mtg-top-8 - A top 8 mtg scraper using python

Pelican plugin that adds site search capability

Web Content Retrieval for Humans™

Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages.

An application that on a given url, crowls a web page and gets all words, sorts and counts them.

Basic-html-scraper - A complete how to of web scraping with Python for beginners

VG-Scraper is a python program using the module called BeautifulSoup which allows anyone to scrape something off an website. This program lets you put in a number trough an input and a number is 1 news article.

This repo has the source code for the crawler and data crawled from auto-data.net

薅薅乐 - JD 测试脚本

京东茅台抢购最新优化版本，京东秒杀，添加误差时间调整，优化了茅台抢购进程队列

A Python library for automating interaction with websites.

A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com

Demonstration on how to use async python to control multiple playwright browsers for web-scraping

WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file

Danbooru scraper with python

A tool for scraping and organizing data from NewsBank API searches

Open Crawl Vietnamese Text