Scrapping Connections' info on Linkedin

Last update: Feb 11, 2022

Overview

Scrap It!

! Disclaimer:

THIS CODE HAS BEEN IMPLEMENTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE INTERVIEW PROCESS OF MCI.IR AND INTERVIEWEES WERE SUPPOSED TO PUSH THE CODE ON THEIR GITHUB. CONTACT ME TO REMOVE THIS REPOSITORY, IN CASE IT IS AGAINST YOUR TOS.
IF ANY CONNECTION IS NOT OK TO THEIR CONTACT INFO BE HERE, CONTACT ME TO REMOVE THEM ASAP.

Functionalities:

This script automatically:

opens your Linkedin profile
accesses your connections page
crawls the page for grabbing their profile links
scraps each person's information and dumps it to Sqlite db
and simultaneously logs all necessary level of info into Linkedin.log

DataFlowDiagram

Enlisted desing patterns are (but not limited to):

Creator
Low Coupling
High Cohesion
Indirection
Modularization
Information Expert

Log/DB files:

Further develepments notes:

Check out other DBs that supports multithreading which anable us dumpping all information rows at once
change IP per request (You can find its code on my "Social Media Computing course" repository)
Sometimes you need to scroll down manually when "connection" page is being loaded. You can add one line code to scroll down for you.

References:

https://www.linkedin.com/pulse/how-easy-scraping-data-from-linkedin-profiles-david-craven

https://www.geeksforgeeks.org/scrape-linkedin-using-selenium-and-beautiful-soup-in-python/

https://stackoverflow.com/questions/28883769/remove-odd-indexed-elements-from-list-in-python#:~:text=Fun%20fact%3A%20to%20remove%20all,remove(x)%20.

https://stackoverflow.com/questions/34759787/fetch-all-href-link-using-selenium-in-python

https://www.tutorialspoint.com/fetch-all-href-link-using-selenium-in-python

https://stackoverflow.com/questions/64717302/deprecationwarning-executable-path-has-been-deprecated-selenium-python

https://chromedriver.chromium.org/home

https://www.youtube.com/watch?v=-ARI4Cz-awo

Scrapping Connections' info on Linkedin

Related tags

Overview

Scrap It!

Functionalities:

DataFlowDiagram

Enlisted desing patterns are (but not limited to):

Log/DB files:

Further develepments notes:

References:

Owner

MohammadReza Ardestani

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Scraping Top Repositories for Topics on GitHub,

Pseudo API for Google Trends

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

This is python to scrape overview and reviews of companies from Glassdoor.

Libextract: extract data from websites

An arxiv spider

Haphazard scripts for scraping bitcoin/bitcoin data from GitHub

ChromiumJniGenerator - Jni Generator module extracted from Chromium project

Demonstration on how to use async python to control multiple playwright browsers for web-scraping

Dictionary - Application focused on word search through web scraping

Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.

Scrapping the data from each page of biocides listed on the BAUA website into a csv file

一些爬虫相关的签名、验证码破解

Pythonic Crawling / Scraping Framework based on Non Blocking I/O operations.

Screenhook is a script that captures an image of a web page and send it to a discord webhook.

The first public repository that provides free BUBT website scraping API script on Github.

A Scrapper with python

A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

Scrape plants scientific name information from Agroforestry Species Switchboard 2.0.