This is python to scrape overview and reviews of companies from Glassdoor.

Last update: Jun 23, 2022

Related tags

Overview

Data Scraping for Glassdoor

This is python to scrape overview and reviews of companies from Glassdoor. Please use it carefully and follow the Terms of Service that explicitly prohibits web scraping.

Built With

Python
ChromeDriver

(back to top)

Getting Started

Download the SeleniumGlassdor.py file. Change the path of the chromedriver on your machine. Use your own file that contain the lists of the companies glassdoor url. The company url csv file is also attached here. The way to generate the file is also based on selenium, searching the 'glassdoor' + company name in google search engine, and extract the url from the first results. Per requests, I can also upload the file accordingly.

Prerequisites

Install the selenium before using it.

selenium
```
pip install selenium
```

For the other sections

If you want to scape data from the other sections, such as jobs, salaries. You can use the following methods to first extract the url and then use the similar method to downlode the sections.

reviewsUrl = browser.find_element_by_xpath("//a[@data-label='Reviews']").get_attribute('href')
jobsUrl = browser.find_element_by_xpath("//a[@data-label='Jobs']").get_attribute('href')
salariesUrl = browser.find_element_by_xpath("//a[@data-label='Salaries']").get_attribute('href')
interviewsUrl = browser.find_element_by_xpath("//a[@data-label='Interviews']").get_attribute('href')
benefitsUrl = browser.find_element_by_xpath("//a[@data-label='Benefits']").get_attribute('href')
photosUrl = browser.find_element_by_xpath("//a[@data-label='Photos']").get_attribute('href')

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Houping - [email protected]

(back to top)

This is python to scrape overview and reviews of companies from Glassdoor.

Related tags

Overview

Data Scraping for Glassdoor

Built With

Getting Started

Prerequisites

For the other sections

Contributing

License

Contact

Owner

Houping

A python script to extract answers to any question on Quora (Quora+ included)

Dex-scrapper - Hobby project for scrapping dex data on VeChain

API which uses discord to scrape NameMC searches/droptime/dropping status of minecraft names

Scrapping the data from each page of biocides listed on the BAUA website into a csv file

✂️🕷️ Spider-Cut is a Network Mapper Framework (NMAP Framework)

Subscrape - A Python scraper for substrate chains

Footballmapies - Football mapies for learning webscraping and use of gmplot module in python

A simple django-rest-framework api using web scraping

Instagram_scrapper - This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or excel file easily.

a small library for extracting rich content from urls

Kusonime scraper using python3

crypto currency scraping

Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit

Proxy scraper. Format: IP | PORT | COUNTRY | TYPE

feapder 是一款简单、快速、轻量级的爬虫框架。以开发快速、抓取快速、使用简单、功能强大为宗旨。支持分布式爬虫、批次爬虫、多模板爬虫，以及完善的爬虫报警机制。

A Python module to bypass Cloudflare's anti-bot page.

A python module to parse the Open Graph Protocol

Telegram Group Scrapper

This script is intended to crawl license information of repositories through the GitHub API.

VG-Scraper is a python program using the module called BeautifulSoup which allows anyone to scrape something off an website. This program lets you put in a number trough an input and a number is 1 news article.