Tutorial on scikit-learn and IPython for parallel machine learning

Overview

Parallel Machine Learning with scikit-learn and IPython

Video Tutorial

Video recording of this tutorial given at PyCon in 2013. The tutorial material has been rearranged in part and extended. Look at the title of the of the notebooks to be able to follow along the presentation.

Browse the static notebooks on nbviewer.ipython.org.

Scope of this tutorial:

  • Learn common machine learning concepts and how they match the scikit-learn Estimator API.

  • Learn about scalable feature extraction for text classification and clustering

  • Learn how to perform parallel cross validation and hyper parameters grid search in parallel with IPython.

  • Learn to analyze the kinds of common errors predictive models are subject to and how to refine your modeling to take this analysis into account.

  • Learn to optimize memory allocation on your computing nodes with numpy memory mapping features.

  • Learn how to run a cheap IPython cluster for interactive predictive modeling on the Amazon EC2 spot instances using StarCluster.

Target audience

This tutorial targets developers with some experience with scikit-learn and machine learning concepts in general.

It is recommended to first go through one of the tutorials hosted at scikit-learn.org if you are new to scikit-learn.

You might might also want to have a look at SciPy Lecture Notes first if you are new to the NumPy / SciPy / matplotlib ecosystem.

Setup

Install NumPy, SciPy, matplotlib, IPython, psutil, and scikit-learn in their latest stable version (e.g. IPython 2.2.0 and scikit-learn 0.15.2 at the time of writing).

You can find up to date installation instructions on scikit-learn.org and ipython.org .

To check your installation, launch the ipython interactive shell in a console and type the following import statements to check each library:

>>> import numpy
>>> import scipy
>>> import matplotlib
>>> import psutil
>>> import sklearn

If you don't get any message, everything is fine. If you get an error message, please ask for help on the mailing list of the matching project and don't forget to mention the version of the library you are trying to install along with the type of platform and version (e.g. Windows 8.1, Ubuntu 14.04, OSX 10.9...).

You can exit the ipython shell by typing exit.

Fetching the data

It is recommended to fetch the datasets ahead of time before diving into the tutorial material itself. To do so run the fetch_data.py script in this folder:

python fetch_data.py

Using the IPython notebook to follow the tutorial

The tutorial material and exercises are hosted in a set of IPython executable notebook files.

To run them interactively do:

$ cd notebooks
$ ipython notebook

This should automatically open a new browser window listing all the notebooks of the folder.

You can then execute the cell in order by hitting the "Shift-Enter" keys and watch the output display directly under the cell and the cursor move on to the next cell. Go to the "Help" menu for links to the notebook tutorial.

Credits

Some of this material is adapted from the scipy 2013 tutorial:

http://github.com/jakevdp/sklearn_scipy2013

Original authors:

Owner
Olivier Grisel
Machine Learning Engineer a Inria Saclay (Parietal team).
Olivier Grisel
Includes PyTorch -> Keras model porting code for ConvNeXt family of models with fine-tuning and inference notebooks.

ConvNeXt-TF This repository provides TensorFlow / Keras implementations of different ConvNeXt [1] variants. It also provides the TensorFlow / Keras mo

Sayak Paul 87 Dec 06, 2022
My freqtrade strategies

My freqtrade-strategies Hi there! This is repo for my freqtrade-strategies. My name is Ilya Zelenchuk, I'm a lecturer at the SPbU university (https://

171 Dec 05, 2022
Intelligent Video Analytics toolkit based on different inference backends.

English | 中文 OpenIVA OpenIVA is an end-to-end intelligent video analytics development toolkit based on different inference backends, designed to help

Quantum Liu 15 Oct 27, 2022
One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking

One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking This is an official implementation for NEAS presented in CVPR

Multimedia Research 19 Sep 08, 2022
PyTorch implementation of the TTC algorithm

Trust-the-Critics This repository is a PyTorch implementation of the TTC algorithm and the WGAN misalignment experiments presented in Trust the Critic

0 Nov 29, 2021
This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transformers.

TransMix: Attend to Mix for Vision Transformers This repository includes the official project for the paper: TransMix: Attend to Mix for Vision Transf

Jie-Neng Chen 130 Jan 01, 2023
Apply our monocular depth boosting to your own network!

MergeNet - Boost Your Own Depth Boost custom or edited monocular depth maps using MergeNet Input Original result After manual editing of base You can

Computational Photography Lab @ SFU 142 Dec 17, 2022
Code for the paper "Implicit Representations of Meaning in Neural Language Models"

Implicit Representations of Meaning in Neural Language Models Preliminaries Create and set up a conda environment as follows: conda create -n state-pr

Belinda Li 39 Nov 03, 2022
Search Youtube Video and Get Video info

PyYouTube Get Video Data from YouTube link Installation pip install PyYouTube How to use it ? Get Videos Data from pyyoutube import Data yt = Data("ht

lokaman chendekar 35 Nov 25, 2022
A curated list of neural rendering resources.

Awesome-of-Neural-Rendering A curated list of neural rendering and related resources. Please feel free to pull requests or open an issue to add papers

Zhiwei ZHANG 43 Dec 09, 2022
Subpopulation detection in high-dimensional single-cell data

PhenoGraph for Python3 PhenoGraph is a clustering method designed for high-dimensional single-cell data. It works by creating a graph ("network") repr

Dana Pe'er Lab 42 Sep 05, 2022
When are Iterative GPs Numerically Accurate?

When are Iterative GPs Numerically Accurate? This is a code repository for the paper "When are Iterative GPs Numerically Accurate?" by Wesley Maddox,

Wesley Maddox 1 Jan 06, 2022
Deep Reinforcement Learning based autonomous navigation for quadcopters using PPO algorithm.

PPO-based Autonomous Navigation for Quadcopters This repository contains an implementation of Proximal Policy Optimization (PPO) for autonomous naviga

Bilal Kabas 16 Nov 11, 2022
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

Salesforce 334 Jan 06, 2023
A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

RE2 This is a pytorch implementation of the ACL 2019 paper "Simple and Effective Text Matching with Richer Alignment Features". The original Tensorflo

287 Dec 21, 2022
Efficiently Disentangle Causal Representations

Efficiently Disentangle Causal Representations Install dependency pip install -r requirements.txt Main experiments Causality direction prediction cd

4 Apr 01, 2022
Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning This repository is the official implementation of CARE.

ChongjianGE 89 Dec 02, 2022
CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

M-BERT-Study CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY Motivation Multilingual BERT (M-BERT) has shown surprising cross lingual a

CogComp 1 Feb 28, 2022
IEEE Winter Conference on Applications of Computer Vision 2022 Accepted

SSKT(Accepted WACV2022) Concept map Dataset Image dataset CIFAR10 (torchvision) CIFAR100 (torchvision) STL10 (torchvision) Pascal VOC (torchvision) Im

1 Nov 17, 2022
Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)

Causality In Traffic Accident (Under Construction) Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020) Overview Data Prepa

Tackgeun 21 Nov 20, 2022