SigOpt wrappers for scikit-learn methods

Overview

SigOpt + scikit-learn Interfacing

Build Status

This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together

Getting Started

Install the sigopt_sklearn python modules with pip install sigopt_sklearn.

Sign up for an account at https://sigopt.com. To use the interfaces, you'll need your API token from the API tokens page.

SigOptSearchCV

The simplest use case for SigOpt in conjunction with scikit-learn is optimizing estimator hyperparameters using cross validation. A short example that tunes the parameters of an SVM on a small dataset is provided below

from sklearn import svm, datasets
from sigopt_sklearn.search import SigOptSearchCV

# find your SigOpt client token here : https://sigopt.com/tokens
client_token = '<YOUR_SIGOPT_CLIENT_TOKEN>'

iris = datasets.load_iris()

# define parameter domains
svc_parameters  = {'kernel': ['linear', 'rbf'], 'C': (0.5, 100)}

# define sklearn estimator
svr = svm.SVC()

# define SigOptCV search strategy
clf = SigOptSearchCV(svr, svc_parameters, cv=5,
    client_token=client_token, n_jobs=5, n_iter=20)

# perform CV search for best parameters and fits estimator
# on all data using best found configuration
clf.fit(iris.data, iris.target)

# clf.predict() now uses best found estimator
# clf.best_score_ contains CV score for best found estimator
# clf.best_params_ contains best found param configuration

The objective optimized by default is is the default score associated with an estimator. A custom objective can be used by passing the scoring option to the SigOptSearchCV constructor. Shown below is an example that uses the f1_score already implemented in sklearn

from sklearn.metrics import f1_score, make_scorer
f1_scorer = make_scorer(f1_score)

# define SigOptCV search strategy
clf = SigOptSearchCV(svr, svc_parameters, cv=5, scoring=f1_scorer,
    client_token=client_token, n_jobs=5, n_iter=50)

# perform CV search for best parameters
clf.fit(X, y)

XGBoostClassifier

SigOptSearchCV also works with XGBoost's XGBClassifier wrapper. A hyperparameter search over XGBClassifier models can be done using the same interface

import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from sklearn import datasets
from sigopt_sklearn.search import SigOptSearchCV

# find your SigOpt client token here : https://sigopt.com/tokens
client_token = '<YOUR_SIGOPT_CLIENT_TOKEN>'
iris = datasets.load_iris()

xgb_params = {
  'learning_rate': (0.01, 0.5),
  'n_estimators': (10, 50),
  'max_depth': (3, 10),
  'min_child_weight': (6, 12),
  'gamma': (0, 0.5),
  'subsample': (0.6, 1.0),
  'colsample_bytree': (0.6, 1.)
}

xgbc = XGBClassifier()

clf = SigOptSearchCV(xgbc, xgb_params, cv=5,
    client_token=client_token, n_jobs=5, n_iter=70, verbose=1)

clf.fit(iris.data, iris.target)

SigOptEnsembleClassifier

This class concurrently trains and tunes several classification models within sklearn to facilitate model selection efforts when investigating new datasets.

You'll need to install the sigopt_sklearn library with the extra requirements of xgboost for this aspect of the library to work:

pip install sigopt_sklearn[ensemble]

A short example, using an activity recognition dataset is provided below We also have a video tutorial outlining how to run this example here:

SigOpt scikit-learn Tutorial

# Human Activity Recognition Using Smartphone
# https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
wget https://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip
unzip UCI\ HAR\ Dataset.zip
cd UCI\ HAR\ Dataset
import numpy as np
import pandas as pd
from sigopt_sklearn.ensemble import SigOptEnsembleClassifier

def load_datafile(filename):
  X = []
  with open(filename, 'r') as f:
    for l in f:
      X.append(np.array([float(v) for v in l.split()]))
  X = np.vstack(X)
  return X

X_train = load_datafile('train/X_train.txt')
y_train = load_datafile('train/y_train.txt').ravel()
X_test = load_datafile('test/X_test.txt')
y_test = load_datafile('test/y_test.txt').ravel()

# fit and tune several classification models concurrently
# find your SigOpt client token here : https://sigopt.com/tokens
sigopt_clf = SigOptEnsembleClassifier()
sigopt_clf.parallel_fit(X_train, y_train, est_timeout=(40 * 60),
    client_token='<YOUR_CLIENT_TOKEN>')

# compare model performance on hold out set
ensemble_train_scores = [est.score(X_train,y_train) for est in sigopt_clf.estimator_ensemble]
ensemble_test_scores = [est.score(X_test,y_test) for est in sigopt_clf.estimator_ensemble]
data = sorted(zip([est.__class__.__name__
                        for est in sigopt_clf.estimator_ensemble], ensemble_train_scores, ensemble_test_scores),
                        reverse=True, key=lambda x: (x[2], x[1]))
pd.DataFrame(data, columns=['Classifier ALGO.', 'Train ACC.', 'Test ACC.'])

CV Fold Timeouts

SigOptSearchCV performs evaluations on cv folds in parallel using joblib. Timeouts are now supported in the master branch of joblib and SigOpt can use this timeout information to learn to avoid hyperparameter configurations that are too slow.

from sklearn import svm, datasets
from sigopt_sklearn.search import SigOptSearchCV

# find your SigOpt client token here : https://sigopt.com/tokens
client_token = '<YOUR_SIGOPT_CLIENT_TOKEN>'
dataset = datasets.fetch_20newsgroups_vectorized()
X = dataset.data
y = dataset.target

# define parameter domains
svc_parameters  = {
  'kernel': ['linear', 'rbf'],
  'C': (0.5, 100),
  'max_iter': (10, 200),
  'tol': (1e-2, 1e-6)
}
svr = svm.SVC()

# SVM fitting can be quite slow, so we set timeout = 180 seconds
# for each fit.  SigOpt will then avoid configurations that are too slow
clf = SigOptSearchCV(svr, svc_parameters, cv=5, opt_timeout=180,
    client_token=client_token, n_jobs=5, n_iter=40)

clf.fit(X, y)

Categoricals

SigOptSearchCV supports categorical parameters specified as list of string as the kernel parameter is in the SVM example:

svc_parameters  = {'kernel': ['linear', 'rbf'], 'C': (0.5, 100)}

SigOpt also supports non-string valued categorical parameters. For example the hidden_layer_sizes parameter in the MLPRegressor example below,

parameters = {
  'activation': ['relu', 'tanh', 'logistic'],
  'solver': ['lbfgs', 'adam'],
  'alpha': (0.0001, 0.01),
  'learning_rate_init': (0.001, 0.1),
  'power_t': (0.001, 1.0),
  'beta_1': (0.8, 0.999),
  'momentum': (0.001, 1.0),
  'beta_2': (0.8, 0.999),
  'epsilon': (0.00000001, 0.0001),
  'hidden_layer_sizes': {
    'shallow': (100,),
    'medium': (10, 10),
    'deep': (10, 10, 10, 10)
  }
}
nn = MLPRegressor()
clf = SigOptSearchCV(nn, parameters, cv=5, cv_timeout=240,
    client_token=client_token, n_jobs=5, n_iter=40)

clf.fit(X, y)
Owner
SigOpt
SigOpt
๐Ÿ˜Š Python module for face feature changing

PyWarping Python module for face feature changing Installation pip install pywarping If you get an error: No such file or directory: 'cmake': 'cmake',

Dopevog 10 Sep 10, 2021
A Nim frontend for pytorch, aiming to be mostly auto-generated and internally using ATen.

Master Release Pytorch - Py + Nim A Nim frontend for pytorch, aiming to be mostly auto-generated and internally using ATen. Because Nim compiles to C+

Giovanni Petrantoni 425 Dec 22, 2022
Official Implementation for Fast Training of Neural Lumigraph Representations using Meta Learning.

Fast Training of Neural Lumigraph Representations using Meta Learning Project Page | Paper | Data Alexander W. Bergman, Petr Kellnhofer, Gordon Wetzst

Alex 39 Oct 08, 2022
Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning plugins for distributed training using the Ray distributed compu

167 Jan 02, 2023
Implementation of Kalman Filter in Python

Kalman Filter in Python This is a basic example of how Kalman filter works in Python. I do plan on refactoring and expanding this repo in the future.

Enoch Kan 35 Sep 11, 2022
Power Core Simulator!

Power Core Simulator Power Core Simulator is a simulator based off the Roblox game "Pinewood Builders Computer Core". In this simulator, you can choos

BananaJeans 1 Nov 13, 2021
Code for our paper "Multi-scale Guided Attention for Medical Image Segmentation"

Medical Image Segmentation with Guided Attention This repository contains the code of our paper: "'Multi-scale self-guided attention for medical image

Ashish Sinha 394 Dec 28, 2022
Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data ๐ŸŒˆ

Rainbow ๐ŸŒˆ An implementation of Rainbow DQN which reaches a median HNS of 205.7 after only 10M frames (the original Rainbow from Hessel et al. 2017 re

Dominik Schmidt 31 Dec 21, 2022
Embracing Single Stride 3D Object Detector with Sparse Transformer

SST: Single-stride Sparse Transformer This is the official implementation of paper: Embracing Single Stride 3D Object Detector with Sparse Transformer

TuSimple 385 Dec 28, 2022
This repository contains the source code for the paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",

DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks Project Page | Video | Presentation | Paper | Data L

Facebook Research 281 Dec 22, 2022
2.86% and 15.85% on CIFAR-10 and CIFAR-100

Shake-Shake regularization This repository contains the code for the paper Shake-Shake regularization. This arxiv paper is an extension of Shake-Shake

Xavier Gastaldi 294 Nov 22, 2022
Face Mask Detector by live camera using tensorflow-keras, openCV and Python

Face Mask Detector ๐Ÿ˜ท by Live Camera Detecting masked or unmasked faces by live camera with percentange of mask occupation About Project: This an Arti

Karan Shingde 2 Apr 04, 2022
A Tensorflow based library for Time Series Modelling with Gaussian Processes

Markovflow Documentation | Tutorials | API reference | Slack What does Markovflow do? Markovflow is a Python library for time-series analysis via prob

Secondmind Labs 24 Dec 12, 2022
Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction

Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction Requirements The code has been tested running under Python 3.7.4, with the foll

zshicode 84 Jan 01, 2023
Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

Physion: Evaluating Physical Prediction from Vision in Humans and Machines This repo contains code and data to reproduce the results in our paper, Phy

Cognitive Tools Lab 38 Jan 06, 2023
LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021 We propose a cross encoder model (LTR_CrossEncoder) for information retrieval, re-retrie

Hieu Duong 7 Jan 12, 2022
graph-theoretic framework for robust pairwise data association

CLIPPER: A Graph-Theoretic Framework for Robust Data Association Data association is a fundamental problem in robotics and autonomy. CLIPPER provides

MIT Aerospace Controls Laboratory 118 Dec 28, 2022
ANN model for prediction a spatio-temporal distribution of supercooled liquid in mixed-phase clouds using Doppler cloud radar spectra.

VOODOO Revealing supercooled liquid beyond lidar attenuation Explore the docs ยป Report Bug ยท Request Feature Table of Contents About The Project Built

remsens-lim 2 Apr 28, 2022
[ACM MM 2021] Yes, "Attention is All You Need", for Exemplar based Colorization

Transformer for Image Colorization This is an implemention for Yes, "Attention Is All You Need", for Exemplar based Colorization, and the current soft

Wang Yin 30 Dec 07, 2022
SweiNet is an uncertainty-quantifying shear wave speed (SWS) estimator for ultrasound shear wave elasticity (SWE) imaging.

SweiNet SweiNet is an uncertainty-quantifying shear wave speed (SWS) estimator for ultrasound shear wave elasticity (SWE) imaging. SweiNet takes as in

Felix Jin 3 Mar 31, 2022