Scraping and analysis of leetcode-compensations page.

Overview

Leetcode compensations report

Scraping and analysis of leetcode-compensations page.

Salary Distribution Salary

Report

INDIA : 5th Jan 2019 - 5th Aug 2021 / fixed salary

INDIA : 5th Jan 2019 - 5th Aug 2021 / fixed salary, dark mode

INDIA : 5th Jan 2019 - 5th Aug 2021 / total salary

INDIA : 5th Jan 2019 - 5th Aug 2021 / total salary, dark mode

Directory structure

  • data
    • imgs - images for reports
    • logs - scraping logs
    • mappings - standardized company, location and title mappings as well as unmapped entities
    • meta - meta information for the posts like post_id, date, title, href.
    • out - data from info.all_info.get_clean_records_for_india()
    • posts - text from the post
    • reports - salary analysis by companies, titles and experience
  • info - functions to posts data(along with the standardized entities) in a tabular format
  • leetcode - scraper
  • utils - constants and helper methods

Setup

  1. Clone the repo.
  2. Put the chromedriver in the utils directory.
  3. Setup virual enviroment python -m venv leetcode.
  4. Install necessary packages pip install -r requirements.txt.
  5. To create the reports npm install vega-lite vega-cli canvas(needed to save altair plots).

Scraping

$ export PTYHONPATH=<project_directory>
$ python leetcode/posts_meta.py --till_date 2021/08/03

# sample output
2021-08-03 19:36:07.474 | INFO     | __main__:<module>:48 - page no: 1 | # posts: 15
$ python leetcode/posts.py

# sample output
2021-08-03 19:36:25.997 | INFO     | __main__:<module>:45 - post_id: 1380805 done!
2021-08-03 19:36:28.995 | INFO     | __main__:<module>:45 - post_id: 1380646 done!
2021-08-03 19:36:31.631 | INFO     | __main__:<module>:45 - post_id: 1380542 done!
2021-08-03 19:36:34.727 | INFO     | __main__:<module>:45 - post_id: 1380068 done!
2021-08-03 19:36:37.280 | INFO     | __main__:<module>:45 - post_id: 1379990 done!
2021-08-03 19:36:40.509 | INFO     | __main__:<module>:45 - post_id: 1379903 done!
2021-08-03 19:36:41.096 | WARNING  | __main__:<module>:34 - sleeping extra for post_id: 1379487
2021-08-03 19:36:44.530 | INFO     | __main__:<module>:45 - post_id: 1379487 done!
2021-08-03 19:36:47.115 | INFO     | __main__:<module>:45 - post_id: 1379208 done!
2021-08-03 19:36:49.660 | INFO     | __main__:<module>:45 - post_id: 1378689 done!
2021-08-03 19:36:50.470 | WARNING  | __main__:<module>:34 - sleeping extra for post_id: 1378620
2021-08-03 19:36:53.866 | INFO     | __main__:<module>:45 - post_id: 1378620 done!
2021-08-03 19:36:57.203 | INFO     | __main__:<module>:45 - post_id: 1378334 done!
2021-08-03 19:37:00.570 | INFO     | __main__:<module>:45 - post_id: 1378288 done!
2021-08-03 19:37:03.226 | INFO     | __main__:<module>:45 - post_id: 1378181 done!
2021-08-03 19:37:05.895 | INFO     | __main__:<module>:45 - post_id: 1378113 done!

Report DataFrame

$ ipython

In [1]: from info.all_info import get_clean_records_for_india                                                               
In [2]: df = get_clean_records_for_india()                                                                                  
2021-08-04 15:47:11.615 | INFO     | info.all_info:get_raw_records:95 - n records: 4134
2021-08-04 15:47:11.616 | WARNING  | info.all_info:get_raw_records:97 - missing post_ids: ['1347044', '1193859', '1208031', '1352074', '1308645', '1206533', '1309603', '1308672', '1271172', '214751', '1317751', '1342147', '1308728', '1138584']
2021-08-04 15:47:11.696 | WARNING  | info.all_info:_save_unmapped_labels:54 - 35 unmapped company saved
2021-08-04 15:47:11.705 | WARNING  | info.all_info:_save_unmapped_labels:54 - 353 unmapped title saved
2021-08-04 15:47:11.708 | WARNING  | info.all_info:get_clean_records_for_india:122 - 1779 rows dropped(location!=india)
2021-08-04 15:47:11.709 | WARNING  | info.all_info:get_clean_records_for_india:128 - 385 rows dropped(incomplete info)
2021-08-04 15:47:11.710 | WARNING  | info.all_info:get_clean_records_for_india:134 - 7 rows dropped(internships)
In [3]: df.shape                                                                                                            
Out[3]: (1963, 14)

Report

$ python reports/plots.py # generate fixed comp. plots
$ python reports/report.py # fixed comp.
$ python reports/report_dark.py # fixed comp., dark mode

$ python reports/plots_tc.py # generate total comp. plots
$ python reports/report_tc.py # total comp.
$ python reports/report_dark.py # total comp., dark mode

Samples

title : Flipkart | Software Development Engineer-1 | Bangalore
url : https://leetcode.com/discuss/compensation/834212/Flipkart-or-Software-Development-Engineer-1-or-Bangalore
company : flipkart
title : sde 1
yoe : 0.0 years
salary : ₹ 1800000.0
location : bangalore
post Education: B.Tech from NIT (2021 passout) Years of Experience: 0 Prior Experience: Fresher Date of the Offer: Aug 2020 Company: Flipkart Title/Level: Software Development Engineer-1 Location: Bangalore Salary: INR 18,00,000 Performance Incentive: INR 1,80,000 (10% of base pay) ESOPs: 48 units => INR 5,07,734 (vested over 4 years. 25% each year) Relocation Reimbursement: INR 40,000 Telephone Reimbursement: INR 12,000 Home Broadband Reimbursement: INR 12,000 Gratuity: INR 38,961 Insurance: INR 27,000 Other Benefits: INR 40,000 (15 days accomodation + travel) (this is different from the relocation reimbursement) Total comp (Salary + Bonus + Stock): Total CTC: INR 26,57,695; First year: INR 22,76,895 Other details: Standard Offer for On-Campus Hire Allowed Branches: B.Tech CSE/IT (6.0 CGPA & above) Process consisted of Coding test & 3 rounds of interviews. I don't remember questions exactly. But they vary from topics such as Graph(Topological Sort, Bi-Partite Graph), Trie based questions, DP based questions both recursive and dp approach, trees, Backtracking.

title : Cloudera | SSE | Bangalore | 2019
url : https://leetcode.com/discuss/compensation/388432/Cloudera-or-SSE-or-Bangalore-or-2019
company : cloudera
title : sde 2
yoe : 2.5 years
salary : ₹ 2800000.0
location : bangalore
post Education: MTech from Tier 1 College Years of Experience: 2.5 Prior Experience: SDE at Flipkart Date of the Offer: Sept 10, 2019 Company: Cloudera Title/Level: Senior Software Engineer (SSE) Location: Bangalore, India Salary: Rs 28,00,000 Bonus: Rs 2,80,000 (10 % of base) PF & Gratuity: Rs 1,88,272 Stock bonus: 5000 units over 4 years ($9 per unit) Other Benefits: Rs 4,00,000 (Health, Term Life and Personal Accident Insurance, Annual Medical Health Checkup, Transportation, Education Reimbursement) Total comp (Salary + Bonus + Stock): Rs 4070572

title : Amadeus Labs | MTS | Bengaluru
url : https://leetcode.com/discuss/compensation/1109046/Amadeus-Labs-or-MTS-or-Bengaluru
company : amadeus labs
title : mts 1
yoe : 7.0 years
salary : ₹ 1700000.0
location : bangalore
post Education: B.Tech. in ECE Years of Experience: 7 Prior Experience: Worked at few MNCs Date of the Offer: Jan 2021 Company: Amadeus Labs Title/Level: Member of Technical Staff Location: Bengaluru, India Salary: ₹ 1,700,000 Signing Bonus: ₹ 50,000 Stock bonus: None Bonus: 137,000 Total comp (Salary + Bonus + Stock): ~₹1,887,000 Benefits: Employee and family Insurance

Owner
utsav
Lead MLE @ freshworks
utsav
Extract Thailand COVID-19 Cluster data from daily briefing pdf.

Thailand COVID-19 Cluster Data Extraction About Extract Clusters from Thailand Daily COVID-19 briefing PDF Download latest data Here. Data will be upd

Noppakorn Jiravaranun 5 Sep 27, 2021
PyChemia, Python Framework for Materials Discovery and Design

PyChemia, Python Framework for Materials Discovery and Design PyChemia is an open-source Python Library for materials structural search. The purpose o

Materials Discovery Group 61 Oct 02, 2022
Top 50 best selling books on amazon

It's a dashboard that shows the detailed information about each book in the top 50 best selling books on amazon over the last ten years

Nahla Tarek 1 Nov 18, 2021
Full ELT process on GCP environment.

Rent Houses Germany - GCP Pipeline Project: The goal of the project is to extract data about house rentals in Germany, store, process and analyze it u

Felipe Demenech Vasconcelos 2 Jan 20, 2022
Modular analysis tools for neurophysiology data

Neuroanalysis Modular and interactive tools for analysis of neurophysiology data, with emphasis on patch-clamp electrophysiology. Functions for runnin

Allen Institute 5 Dec 22, 2021
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is

6 Sep 07, 2022
Display the behaviour of a realtime program with a scope or logic analyser.

1. A monitor for realtime MicroPython code This library provides a means of examining the behaviour of a running system. It was initially designed to

Peter Hinch 17 Dec 05, 2022
Demonstrate a Dataflow pipeline that saves data from an API into BigQuery table

Overview dataflow-mvp provides a basic example pipeline that pulls data from an API and writes it to a BigQuery table using GCP's Dataflow (i.e., Apac

Chris Carbonell 1 Dec 03, 2021
Bamboolib - a GUI for pandas DataFrames

Community repository of bamboolib bamboolib is joining forces with Databricks. For more information, please read our announcement. Please note that th

Tobias Krabel 863 Jan 08, 2023
A data analysis using python and pandas to showcase trends in school performance.

A data analysis using python and pandas to showcase trends in school performance. A data analysis to showcase trends in school performance using Panda

Jimmy Faccioli 0 Sep 07, 2021
MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Surveying Attack Models for CAVs Guide to Installing CARLA and Collecting Data Our project focuses on surveying attack models for Connveced Autonomous

Isabela Caetano 1 Dec 09, 2021
Python Implementation of Scalable In-Memory Updatable Bitmap Indexing

PyUpBit CS490 Large Scale Data Analytics — Implementation of Updatable Compressed Bitmap Indexing Paper Table of Contents About The Project Usage Cont

Hyeong Kyun (Daniel) Park 1 Jun 28, 2022
ASOUL直播间弹幕抓取&&数据分析

ASOUL直播间弹幕抓取&&数据分析(更新中) 这些文件用于爬取ASOUL直播间的弹幕(其他直播间也可以)和其他信息,以及简单的数据分析生成。

159 Dec 10, 2022
Exploratory Data Analysis for Employee Retention Dataset

Exploratory Data Analysis for Employee Retention Dataset Employee turn-over is a very costly problem for companies. The cost of replacing an employee

kana sudheer reddy 2 Oct 01, 2021
Yet Another Workflow Parser for SecurityHub

YAWPS Yet Another Workflow Parser for SecurityHub "Screaming pepper" by Rum Bucolic Ape is licensed with CC BY-ND 2.0. To view a copy of this license,

myoung34 8 Dec 22, 2022
Pipetools enables function composition similar to using Unix pipes.

Pipetools Complete documentation pipetools enables function composition similar to using Unix pipes. It allows forward-composition and piping of arbit

186 Dec 29, 2022
VHub - An API that permits uploading of vulnerability datasets and return of the serialized data

VHub - An API that permits uploading of vulnerability datasets and return of the serialized data

André Rodrigues 2 Feb 14, 2022
A Python module for clustering creators of social media content into networks

sm_content_clustering A Python module for clustering creators of social media content into networks. Currently supports identifying potential networks

72 Dec 30, 2022
Business Intelligence (BI) in Python, OLAP

Open Mining Business Intelligence (BI) Application Server written in Python Requirements Python 2.7 (Backend) Lua 5.2 or LuaJIT 5.1 (OML backend) Mong

Open Mining 1.2k Dec 27, 2022
Renato 214 Jan 02, 2023