Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining

Overview

logo


**Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining.**



Sections



[Download a PDF version] of this flowchart.






Introduction to Machine Learning and Pattern Classification

[back to top]

  • Predictive modeling, supervised machine learning, and pattern classification - the big picture [Markdown]

  • Entry Point: Data - Using Python's sci-packages to prepare data for Machine Learning tasks and other data analyses [IPython nb]

  • An Introduction to simple linear supervised classification using scikit-learn [IPython nb]






Pre-processing

[back to top]

  • Feature Extraction

    • Tips and Tricks for Encoding Categorical Features in Classification Tasks [IPython nb]
  • Scaling and Normalization

    • About Feature Scaling: Standardization and Min-Max-Scaling (Normalization) [IPython nb]
  • Feature Selection

    • Sequential Feature Selection Algorithms [IPython nb]
  • Dimensionality Reduction

    • Principal Component Analysis (PCA) [IPython nb]
    • The effect of scaling and mean centering of variables prior to a PCA [PDF] [HTML]
    • PCA based on the covariance vs. correlation matrix [IPython nb]
    • Linear Discriminant Analysis (LDA) [IPython nb]
      • Kernel tricks and nonlinear dimensionality reduction via PCA [IPython nb]
  • Representing Text

    • Tf-idf Walkthrough for scikit-learn [IPython nb]



Model Evaluation

[back to top]

  • An Overview of General Performance Metrics of Binary Classifier Systems [PDF]
  • Cross-validation
    • Streamline your cross-validation workflow - scikit-learn's Pipeline in action [IPython nb]
  • Model evaluation, model selection, and algorithm selection in machine learning - Part I [Markdown]
  • Model evaluation, model selection, and algorithm selection in machine learning - Part II [Markdown]



Parameter Estimation

[back to top]

  • Parametric Techniques

    • Introduction to the Maximum Likelihood Estimate (MLE) [IPython nb]
    • How to calculate Maximum Likelihood Estimates (MLE) for different distributions [IPython nb]
  • Non-Parametric Techniques

    • Kernel density estimation via the Parzen-window technique [IPython nb]
    • The K-Nearest Neighbor (KNN) technique
  • Regression Analysis

    • Linear Regression

    • Non-Linear Regression




Machine Learning Algorithms

[back to top]

Bayes Classification

  • Naive Bayes and Text Classification I - Introduction and Theory [PDF]

Logistic Regression

  • Out-of-core Learning and Model Persistence using scikit-learn [IPython nb]

Neural Networks

  • Artificial Neurons and Single-Layer Neural Networks - How Machine Learning Algorithms Work Part 1 [IPython nb]

  • Activation Function Cheatsheet [IPython nb]

Ensemble Methods

  • Implementing a Weighted Majority Rule Ensemble Classifier in scikit-learn [IPython nb]

Decision Trees

  • Cheatsheet for Decision Tree Classification [IPython nb]



Clustering

[back to top]

  • Protoype-based clustering
  • Hierarchical clustering
    • Complete-Linkage Clustering and Heatmaps in Python [IPython nb]
  • Density-based clustering
  • Graph-based clustering
  • Probabilistic-based clustering



Collecting Data

[back to top]

  • Collecting Fantasy Soccer Data with Python and Beautiful Soup [IPython nb]

  • Download Your Twitter Timeline and Turn into a Word Cloud Using Python [IPython nb]

  • Reading MNIST into NumPy arrays [IPython nb]




Data Visualization

[back to top]

  • Exploratory Analysis of the Star Wars API [IPython nb]

  • Matplotlib examples -Exploratory data analysis of the Iris dataset [IPython nb]

  • Artificial Intelligence publications per country

[IPython nb] [PDF]




Statistical Pattern Classification Examples

[back to top]

  • Supervised Learning

    • Parametric Techniques

      • Univariate Normal Density

        • Ex1: 2-classes, equal variances, equal priors [IPython nb]
        • Ex2: 2-classes, different variances, equal priors [IPython nb]
        • Ex3: 2-classes, equal variances, different priors [IPython nb]
        • Ex4: 2-classes, different variances, different priors, loss function [IPython nb]
        • Ex5: 2-classes, different variances, equal priors, loss function, cauchy distr. [IPython nb]
      • Multivariate Normal Density

        • Ex5: 2-classes, different variances, equal priors, loss function [IPython nb]
        • Ex7: 2-classes, equal variances, equal priors [IPython nb]
    • Non-Parametric Techniques




Books

[back to top]

Python Machine Learning




Talks

[back to top]

An Introduction to Supervised Machine Learning and Pattern Classification: The Big Picture

[View on SlideShare]

[Download PDF]



MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song Lyrics

[View on SlideShare]

[Download PDF]




Applications

[back to top]

MusicMood - Machine Learning in Automatic Music Mood Prediction Based on Song Lyrics

This project is about building a music recommendation system for users who want to listen to happy songs. Such a system can not only be used to brighten up one's mood on a rainy weekend; especially in hospitals, other medical clinics, or public locations such as restaurants, the MusicMood classifier could be used to spread positive mood among people.

[musicmood GitHub Repository]


mlxtend - A library of extension and helper modules for Python's data analysis and machine learning libraries.

[mlxtend GitHub Repository]




Resources

[back to top]

  • Copy-and-paste ready LaTex equations [Markdown]

  • Open-source datasets [Markdown]

  • Free Machine Learning eBooks [Markdown]

  • Terms in data science defined in less than 50 words [Markdown]

  • Useful libraries for data science in Python [Markdown]

  • General Tips and Advices [Markdown]

  • A matrix cheatsheat for Python, R, Julia, and MATLAB [HTML]

Owner
Sebastian Raschka
Machine Learning researcher & passionate open source contributor. Author of the "Python Machine Learning" book.
Sebastian Raschka
Data Version Control or DVC is an open-source tool for data science and machine learning projects

Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj

Azaria Gebremichael 2 Jul 29, 2021
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

Daniel Formoso 5.7k Dec 30, 2022
Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

Amazon Web Services - Labs 3.3k Jan 03, 2023
Built on python (Mathematical straight fit line coordinates error predictor machine learning foundational model)

Sum-Square_Error-Business-Analytical-Tool- Built on python (Mathematical straight fit line coordinates error predictor machine learning foundational m

om Podey 1 Dec 03, 2021
Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.

sklearn-evaluation Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking, and Jupyter notebook analysis. Suppo

Eduardo Blancas 354 Dec 31, 2022
ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

ml4h is a toolkit for machine learning on clinical data of all kinds including genetics, labs, imaging, clinical notes, and more

Broad Institute 65 Dec 20, 2022
Laporan Proyek Machine Learning - Azhar Rizki Zulma

Laporan Proyek Machine Learning - Azhar Rizki Zulma Project Overview Domain proyek yang dipilih dalam proyek machine learning ini adalah mengenai hibu

Azhar Rizki Zulma 6 Mar 12, 2022
Ml based project which uses regression technique to predict the price.

Price-Predictor Ml based project which uses regression technique to predict the price. I have used various regression models and finds the model with

Garvit Verma 1 Jul 09, 2022
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

23.3k Dec 31, 2022
AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy m

Robin 55 Dec 27, 2022
AI and Machine Learning with Kubeflow, Amazon EKS, and SageMaker

Data Science on AWS - O'Reilly Book Get the book on Amazon.com Book Outline Quick Start Workshop (4-hours) In this quick start hands-on workshop, you

Data Science on AWS 2.8k Jan 03, 2023
Case studies with Bayesian methods

Case studies with Bayesian methods

Baze Petrushev 8 Nov 26, 2022
Uber Open Source 1.6k Dec 31, 2022
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Dec 29, 2022
To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

Astitva Veer Garg 1 Jan 11, 2022
Short PhD seminar on Machine Learning Security (Adversarial Machine Learning)

Short PhD seminar on Machine Learning Security (Adversarial Machine Learning)

141 Dec 27, 2022
PyHarmonize: Adding harmony lines to recorded melodies in Python

PyHarmonize: Adding harmony lines to recorded melodies in Python About To use this module, the user provides a wav file containing a melody, the key i

Julian Kappler 2 May 20, 2022
Self Organising Map (SOM) for clustering of atomistic samples through unsupervised learning.

Self Organising Map for Clustering of Atomistic Samples - V2 Description Self Organising Map (also known as Kohonen Network) implemented in Python for

Franco Aquistapace 0 Nov 16, 2021
The Fuzzy Labs guide to the universe of open source MLOps

Open Source MLOps This is the Fuzzy Labs guide to the universe of free and open source MLOps tools. Contents What is MLOps, anyway? Data version contr

Fuzzy Labs 352 Dec 29, 2022
LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading

LiuAlgoTrader is a scalable, multi-process ML-ready framework for effective algorithmic trading. The framework simplify development, testing, deployment, analysis and training algo trading strategies

Amichay Oren 458 Dec 24, 2022