A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep

Overview

Here are the sections:

Data Science Cheatsheets

This section contains cheatsheets of basic concepts in data science that will be asked in interviews:

Data Science EBooks

This section contains books that I have read about data science and machine learning:

Data Science Question Bank

This section contains sample questions that were asked in actual data science interviews:

Data Science Case Studies

This section contains case study questions that concern designing machine learning systems to solve practical problems.

Data Science Portfolio

This section contains portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

For a more visually pleasant experience for browsing the portfolio, check out jameskle.com/data-portfolio

  • Recommendation Systems

    • Transfer Rec: My ongoing research work that intersects deep learning and recommendation systems.

    • Movie Recommendation: Designed 4 different models that recommend items on the MovieLens dataset.

    Tools: PyTorch, TensorBoard, Keras, Pandas, NumPy, SciPy, Matplotlib, Seaborn, Scikit-Learn, Surprise, Wordcloud

  • Machine Learning

    • Trip Optimizer: Used XGBoost and evolutionary algorithms to optimize the travel time for taxi vehicles in New York City.

    • Instacart Market Basket Analysis: Tackled the Instacart Market Basket Analysis challenge to predict which products will be in a user's next order.

    Tools: Pandas, NumPy, Matplotlib, XGBoost, Geopy, Scikit-Learn

  • Computer Vision

    • Fashion Recommendation: Built a ResNet-based model that classifies and recommends fashion images in the DeepFashion database based on semantic similarity.

    • Fashion Classification: Developed 4 different Convolutional Neural Networks that classify images in the Fashion MNIST dataset.

    • Dog Breed Classification: Designed a Convolutional Neural Network that identifies dog breed.

    • Road Segmentation: Implemented a Fully-Convolutional Network for semantic segmentation task in the Kitty Road Dataset.

    Tools: TensorFlow, Keras, Pandas, NumPy, Matplotlib, Scikit-Learn, TensorBoard

  • Natural Language Processing

  • Data Analysis and Visualization

    • World Cup 2018 Team Analysis: Analysis and visualization of the FIFA 18 dataset to predict the best possible international squad lineups for 10 teams at the 2018 World Cup in Russia.

    • Spotify Artists Analysis: Analysis and visualization of musical styles from 50 different artists with a wide range of genres on Spotify.

    Tools: Pandas, NumPy, Matplotlib, Rspotify, httr, dplyr, tidyr, radarchart, ggplot2

Data Journalism Portfolio

This section contains portfolio of data journalism articles completed by me for freelance clients and self-learning purposes.

For a more visually pleasant experience for browsing the portfolio, check out jameskle.com/data-journalism

Downloadable Cheatsheets

These PDF cheatsheets come from BecomingHuman.AI.

1 - Neural Network Basics

Neural Network Basics

2 - Neural Network Graphs

Neural Network Graphs

3 - Machine Learning with Emojis

Machine Learning with Emojis

4 - Scikit-Learn With Python

Scikit-Learn With Python

5 - Python Basics

Python Basics

6 - NumPy Basics

NumPy Basics

7 - Pandas Basics

Pandas Basics

8 - Data Wrangling With Pandas

Data Wrangling With Pandas Part 1

Data Wrangling With Pandas Part 2

9 - SciPy Linear Algebra

SciPy Linear Algebra

10 - Matplotlib Basics

Matplotlib Basics

11 - Keras

Keras

12 - Big-O

Big-O

Owner
James Le
Data Journalist 📝 -> Data Scientist 📊 -> Machine Learning Researcher 🔍 -> Data Advocate 🤝
James Le
Convert excel xlsx file's table to csv file, A GUI application on top of python/pyqt and other opensource softwares.

Convert excel xlsx file's table to csv file, A GUI application on top of python/pyqt and other opensource softwares.

David A 0 Jan 20, 2022
An awesome Data Science repository to learn and apply for real world problems.

AWESOME DATA SCIENCE An open source Data Science repository to learn and apply towards solving real world problems. This is a shortcut path to start s

Academic.io 20.3k Jan 09, 2023
The sarge package provides a wrapper for subprocess which provides command pipeline functionality.

Overview The sarge package provides a wrapper for subprocess which provides command pipeline functionality. This package leverages subprocess to provi

Vinay Sajip 14 Dec 18, 2022
This repository outlines deploying a local Kubeflow v1.3 instance on microk8s and deploying a simple MNIST classifier using KFServing.

Zero to Inference with Kubeflow Getting Started This repository houses all of the tools, utilities, and example pipeline implementations for exploring

Ed Henry 3 May 18, 2022
Members: Thomas Longuevergne Program: Network Security Course: 1DV501 Date of submission: 2021-11-02

Mini-project report Members: Thomas Longuevergne Program: Network Security Course: 1DV501 Date of submission: 2021-11-02 Introduction This project was

1 Nov 08, 2021
Canonical source repository for PyYAML

PyYAML - The next generation YAML parser and emitter for Python. To install, type 'python setup.py install'. By default, the setup.py script checks

The YAML Project 2k Jan 01, 2023
A next-generation curated knowledge sharing platform for data scientists and other technical professions.

Knowledge Repo The Knowledge Repo project is focused on facilitating the sharing of knowledge between data scientists and other technical roles using

Airbnb 5.2k Dec 27, 2022
100 numpy exercises (with solutions)

100 numpy exercises This is a collection of numpy exercises from numpy mailing list, stack overflow, and numpy documentation. I've also created some p

Nicolas P. Rougier 9.5k Dec 30, 2022
This tutorial will guide you through the process of self-hosting Polygon

Hosting guide This tutorial will guide you through the process of self-hosting Polygon Before starting Make sure you have the following tools installe

Polygon 2 Jan 31, 2022
[Unofficial] Python PEP in EPUB format

PEPs in EPUB format This is a unofficial repository where I stock all valid PEPs in the EPUB format. Repository Cloning git clone --recursive Mickaël Schoentgen 9 Oct 12, 2022

CoderByte | Practice, Tutorials & Interview Preparation Solutions|

CoderByte | Practice, Tutorials & Interview Preparation Solutions This repository consists of solutions to CoderByte practice, tutorials, and intervie

Eda AYDIN 6 Aug 09, 2022
Hasköy is an open-source variable sans-serif typeface family

Hasköy Hasköy is an open-source variable sans-serif typeface family. Designed with powerful opentype features and each weight includes latin-extended

67 Jan 04, 2023
AiiDA plugin for the HyperQueue metascheduler.

aiida-hyperqueue WARNING: This plugin is still in heavy development. Expect bugs to pop up and the API to change. AiiDA plugin for the HyperQueue meta

AiiDA team 3 Jun 19, 2022
Lightweight, configurable Sphinx theme. Now the Sphinx default!

What is Alabaster? Alabaster is a visually (c)lean, responsive, configurable theme for the Sphinx documentation system. It is Python 2+3 compatible. I

Jeff Forcier 670 Dec 19, 2022
charcade is a string manipulation library that can animate, color, and bruteforce strings

charcade charcade is a string manipulation library that can animate, color, and bruteforce strings. Features Animating text for CLI applications with

Aaron 8 May 23, 2022
Uses diff command to compare expected output with student's submission output

AUTOGRADER for GRADESCOPE using diff with partial grading Description: Uses diff command to compare expected output with student's submission output U

2 Jan 11, 2022
A `:github:` role for Sphinx

sphinx-github-role A github role for Sphinx. Usage Basic usage MyST: :caption: index.md See {github}`astrojuanlu/sphinx-github-role#1`. reStructuredT

Juan Luis Cano Rodríguez 4 Nov 22, 2022
Your Project with Great Documentation.

Read Latest Documentation - Browse GitHub Code Repository The only thing worse than documentation never written, is documentation written but never di

Timothy Edmund Crosley 809 Dec 28, 2022
Preview title and other information about links sent to chats.

Link Preview A small plugin for Nicotine+ to display preview information like title and description about links sent in chats. Plugin created with Nic

Nick 0 Sep 05, 2021
Create docsets for Dash.app-compatible API browser.

doc2dash: Create Docsets for Dash.app and Clones doc2dash is an MIT-licensed extensible Documentation Set generator intended to be used with the Dash.

Hynek Schlawack 498 Dec 30, 2022