SigOpt wrappers for scikit-learn methods

Overview

SigOpt + scikit-learn Interfacing

Build Status

This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together

Getting Started

Install the sigopt_sklearn python modules with pip install sigopt_sklearn.

Sign up for an account at https://sigopt.com. To use the interfaces, you'll need your API token from the API tokens page.

SigOptSearchCV

The simplest use case for SigOpt in conjunction with scikit-learn is optimizing estimator hyperparameters using cross validation. A short example that tunes the parameters of an SVM on a small dataset is provided below

from sklearn import svm, datasets
from sigopt_sklearn.search import SigOptSearchCV

# find your SigOpt client token here : https://sigopt.com/tokens
client_token = '<YOUR_SIGOPT_CLIENT_TOKEN>'

iris = datasets.load_iris()

# define parameter domains
svc_parameters  = {'kernel': ['linear', 'rbf'], 'C': (0.5, 100)}

# define sklearn estimator
svr = svm.SVC()

# define SigOptCV search strategy
clf = SigOptSearchCV(svr, svc_parameters, cv=5,
    client_token=client_token, n_jobs=5, n_iter=20)

# perform CV search for best parameters and fits estimator
# on all data using best found configuration
clf.fit(iris.data, iris.target)

# clf.predict() now uses best found estimator
# clf.best_score_ contains CV score for best found estimator
# clf.best_params_ contains best found param configuration

The objective optimized by default is is the default score associated with an estimator. A custom objective can be used by passing the scoring option to the SigOptSearchCV constructor. Shown below is an example that uses the f1_score already implemented in sklearn

from sklearn.metrics import f1_score, make_scorer
f1_scorer = make_scorer(f1_score)

# define SigOptCV search strategy
clf = SigOptSearchCV(svr, svc_parameters, cv=5, scoring=f1_scorer,
    client_token=client_token, n_jobs=5, n_iter=50)

# perform CV search for best parameters
clf.fit(X, y)

XGBoostClassifier

SigOptSearchCV also works with XGBoost's XGBClassifier wrapper. A hyperparameter search over XGBClassifier models can be done using the same interface

import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from sklearn import datasets
from sigopt_sklearn.search import SigOptSearchCV

# find your SigOpt client token here : https://sigopt.com/tokens
client_token = '<YOUR_SIGOPT_CLIENT_TOKEN>'
iris = datasets.load_iris()

xgb_params = {
  'learning_rate': (0.01, 0.5),
  'n_estimators': (10, 50),
  'max_depth': (3, 10),
  'min_child_weight': (6, 12),
  'gamma': (0, 0.5),
  'subsample': (0.6, 1.0),
  'colsample_bytree': (0.6, 1.)
}

xgbc = XGBClassifier()

clf = SigOptSearchCV(xgbc, xgb_params, cv=5,
    client_token=client_token, n_jobs=5, n_iter=70, verbose=1)

clf.fit(iris.data, iris.target)

SigOptEnsembleClassifier

This class concurrently trains and tunes several classification models within sklearn to facilitate model selection efforts when investigating new datasets.

You'll need to install the sigopt_sklearn library with the extra requirements of xgboost for this aspect of the library to work:

pip install sigopt_sklearn[ensemble]

A short example, using an activity recognition dataset is provided below We also have a video tutorial outlining how to run this example here:

SigOpt scikit-learn Tutorial

# Human Activity Recognition Using Smartphone
# https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
wget https://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip
unzip UCI\ HAR\ Dataset.zip
cd UCI\ HAR\ Dataset
import numpy as np
import pandas as pd
from sigopt_sklearn.ensemble import SigOptEnsembleClassifier

def load_datafile(filename):
  X = []
  with open(filename, 'r') as f:
    for l in f:
      X.append(np.array([float(v) for v in l.split()]))
  X = np.vstack(X)
  return X

X_train = load_datafile('train/X_train.txt')
y_train = load_datafile('train/y_train.txt').ravel()
X_test = load_datafile('test/X_test.txt')
y_test = load_datafile('test/y_test.txt').ravel()

# fit and tune several classification models concurrently
# find your SigOpt client token here : https://sigopt.com/tokens
sigopt_clf = SigOptEnsembleClassifier()
sigopt_clf.parallel_fit(X_train, y_train, est_timeout=(40 * 60),
    client_token='<YOUR_CLIENT_TOKEN>')

# compare model performance on hold out set
ensemble_train_scores = [est.score(X_train,y_train) for est in sigopt_clf.estimator_ensemble]
ensemble_test_scores = [est.score(X_test,y_test) for est in sigopt_clf.estimator_ensemble]
data = sorted(zip([est.__class__.__name__
                        for est in sigopt_clf.estimator_ensemble], ensemble_train_scores, ensemble_test_scores),
                        reverse=True, key=lambda x: (x[2], x[1]))
pd.DataFrame(data, columns=['Classifier ALGO.', 'Train ACC.', 'Test ACC.'])

CV Fold Timeouts

SigOptSearchCV performs evaluations on cv folds in parallel using joblib. Timeouts are now supported in the master branch of joblib and SigOpt can use this timeout information to learn to avoid hyperparameter configurations that are too slow.

from sklearn import svm, datasets
from sigopt_sklearn.search import SigOptSearchCV

# find your SigOpt client token here : https://sigopt.com/tokens
client_token = '<YOUR_SIGOPT_CLIENT_TOKEN>'
dataset = datasets.fetch_20newsgroups_vectorized()
X = dataset.data
y = dataset.target

# define parameter domains
svc_parameters  = {
  'kernel': ['linear', 'rbf'],
  'C': (0.5, 100),
  'max_iter': (10, 200),
  'tol': (1e-2, 1e-6)
}
svr = svm.SVC()

# SVM fitting can be quite slow, so we set timeout = 180 seconds
# for each fit.  SigOpt will then avoid configurations that are too slow
clf = SigOptSearchCV(svr, svc_parameters, cv=5, opt_timeout=180,
    client_token=client_token, n_jobs=5, n_iter=40)

clf.fit(X, y)

Categoricals

SigOptSearchCV supports categorical parameters specified as list of string as the kernel parameter is in the SVM example:

svc_parameters  = {'kernel': ['linear', 'rbf'], 'C': (0.5, 100)}

SigOpt also supports non-string valued categorical parameters. For example the hidden_layer_sizes parameter in the MLPRegressor example below,

parameters = {
  'activation': ['relu', 'tanh', 'logistic'],
  'solver': ['lbfgs', 'adam'],
  'alpha': (0.0001, 0.01),
  'learning_rate_init': (0.001, 0.1),
  'power_t': (0.001, 1.0),
  'beta_1': (0.8, 0.999),
  'momentum': (0.001, 1.0),
  'beta_2': (0.8, 0.999),
  'epsilon': (0.00000001, 0.0001),
  'hidden_layer_sizes': {
    'shallow': (100,),
    'medium': (10, 10),
    'deep': (10, 10, 10, 10)
  }
}
nn = MLPRegressor()
clf = SigOptSearchCV(nn, parameters, cv=5, cv_timeout=240,
    client_token=client_token, n_jobs=5, n_iter=40)

clf.fit(X, y)
Owner
SigOpt
SigOpt
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting (RVM) English | 中文 Official repository for the paper Robust High-Resolution Video Matting with Temporal Guidance. RVM is specific

flow-dev 2 Aug 21, 2022
Aircraft design optimization made fast through modern automatic differentiation

Aircraft design optimization made fast through modern automatic differentiation. Plug-and-play analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

Peter Sharpe 394 Dec 23, 2022
Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma 🔥 News 2021-10

Jingtao Zhan 99 Dec 27, 2022
[TIP2020] Adaptive Graph Representation Learning for Video Person Re-identification

Introduction This is the PyTorch implementation for Adaptive Graph Representation Learning for Video Person Re-identification. Get started git clone h

WuYiming 41 Dec 12, 2022
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

730 Jan 09, 2023
Video-face-extractor - Video face extractor with Python

Python face extractor Setup Create the srcvideos and faces directories Put your

2 Feb 03, 2022
This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

This is the codebase for Diffusion Models Beat GANS on Image Synthesis.

OpenAI 3k Dec 26, 2022
Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation This repository contains the official PyTorch implementation of the following

Wonjong Jang 270 Dec 30, 2022
Mahadi-Now - This Is Pakistani Just Now Login Tools

PAKISTANI JUST NOW LOGIN TOOLS Install apt update apt upgrade apt install python

MAHADI HASAN AFRIDI 19 Apr 06, 2022
Deep functional residue identification

DeepFRI Deep functional residue identification Citing @article {Gligorijevic2019, author = {Gligorijevic, Vladimir and Renfrew, P. Douglas and Koscio

Flatiron Institute 156 Dec 25, 2022
CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Rethinking Semantic Segmentation: A Prototype View Rethinking Semantic Segmentation: A Prototype View, Tianfei Zhou, Wenguan Wang, Ender Konukoglu and

Tianfei Zhou 239 Dec 26, 2022
Facial Image Inpainting with Semantic Control

Facial Image Inpainting with Semantic Control In this repo, we provide a model for the controllable facial image inpainting task. This model enables u

Ren Yurui 8 Nov 22, 2021
Code of paper "Compositionally Generalizable 3D Structure Prediction"

Compositionally Generalizable 3D Structure Prediction In this work, We bring in the concept of compositional generalizability and factorizes the 3D sh

Songfang Han 30 Dec 17, 2022
Deep Learning for Human Part Discovery in Images - Chainer implementation

Deep Learning for Human Part Discovery in Images - Chainer implementation NOTE: This is not official implementation. Original paper is Deep Learning f

Shintaro Shiba 63 Sep 25, 2022
TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning

TransZero++ This repository contains the testing code for the paper "TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning" submitted

Shiming Chen 6 Aug 16, 2022
Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

ViLT Code for the paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision" Install pip install -r requirements.txt pip

Wonjae Kim 922 Jan 01, 2023
给yolov5加个gui界面,使用pyqt5,yolov5是5.0版本

博文地址 https://xugaoxiang.com/2021/06/30/yolov5-pyqt5 代码执行 项目中使用YOLOv5的v5.0版本,界面文件是project.ui pip install -r requirements.txt python main.py 图片检测 视频检测

Xu GaoXiang 215 Dec 30, 2022
This repo includes our code for evaluating and improving transferability in domain generalization (NeurIPS 2021)

Transferability for domain generalization This repo is for evaluating and improving transferability in domain generalization (NeurIPS 2021), based on

gordon 9 Nov 29, 2022
A3C LSTM Atari with Pytorch plus A3G design

NEWLY ADDED A3G A NEW GPU/CPU ARCHITECTURE OF A3C FOR SUBSTANTIALLY ACCELERATED TRAINING!! RL A3C Pytorch NEWLY ADDED A3G!! New implementation of A3C

David Griffis 532 Jan 02, 2023
QueryFuzz implements a metamorphic testing approach to test Datalog engines.

Datalog is a popular query language with applications in several domains. Like any complex piece of software, Datalog engines may contain bugs. The mo

34 Sep 10, 2022