Deploy AutoML as a service using Flask

Last update: Nov 04, 2022

Related tags

Overview

AutoML Service

Deploy automated machine learning (AutoML) as a service using Flask, for both pipeline training and pipeline serving.

The framework implements a fully automated time series classification pipeline, automating both feature engineering and model selection and optimization using Python libraries, TPOT and tsfresh.

Check out the blog post for more info.

Resources:

TPOT– Automated feature preprocessing and model optimization tool
tsfresh– Automated time series feature engineering and selection
Flask– A web development microframework for Python

Architecture

The application exposes both model training and model predictions with a RESTful API. For model training, input data and labels are sent via POST request, a pipeline is trained, and model predictions are accessible via a prediction route.

Pipelines are stored to a unique key, and thus, live predictions can be made on the same data using different feature construction and modeling pipelines.

An automated pipeline for time-series classification.

The model training logic is exposed as a REST endpoint. Raw, labeled training data is uploaded via a POST request and an optimal model is developed.

Raw training data is uploaded via a POST request and a model prediction is returned.

Using the app

View the Jupyter Notebook for an example.

Deploying

# deploy locally
python automl_service.py

# deploy on cloud foundry
cf push

Usage

Train a pipeline:

train_url = 'http://0.0.0.0:8080/train_pipeline'
train_files = {'raw_data': open('data/data_train.json', 'rb'),
               'labels'  : open('data/label_train.json', 'rb'),
               'params'  : open('parameters/train_parameters_model2.yml', 'rb')}

# post request to train pipeline
r_train = requests.post(train_url, files=train_files)
result_df = json.loads(r_train.json())

returns:

{'featureEngParams': {'default_fc_parameters': "['median', 'minimum', 'standard_deviation', 
                                                 'sum_values', 'variance', 'maximum', 
                                                 'length', 'mean']",
                      'impute_function': 'impute',
                      ...},
 'mean_cv_accuracy': 0.865,
 'mean_cv_roc_auc': 0.932,
 'modelId': 1,
 'modelType': "Pipeline(steps=[('stackingestimator', StackingEstimator(estimator=LinearSVC(...))),
                               ('logisticregression', LogisticRegressionClassifier(solver='liblinear',...))])"
 'trainShape': [1647, 8],
 'trainTime': 1.953}

Serve pipeline predictions:

serve_url = 'http://0.0.0.0:8080/serve_prediction'
test_files = {'raw_data': open('data/data_test.json', 'rb'),
              'params' : open('parameters/test_parameters_model2.yml', 'rb')}

# post request to serve predictions from trained pipeline
r_test  = requests.post(serve_url, files=test_files)
result = pd.read_json(r_test.json()).set_index('id')

example_id	prediction
1	0.853
2	0.991
3	0.060
4	0.995
5	0.003
...	...

View all trained models:

r = requests.get('http://0.0.0.0:8080/models')
pipelines = json.loads(r.json())

{'1':
    {'mean_cv_accuracy': 0.873,
     'modelType': "RandomForestClassifier(...),
     ...},
 '2':
    {'mean_cv_accuracy': 0.895,
     'modelType': "GradientBoostingClassifier(...),
     ...},
 '3':
    {'mean_cv_accuracy': 0.859,
     'modelType': "LogisticRegressionClassifier(...),
     ...},
...}

Running the tests

Supply a user argument for the host.

# use local app
py.test --host http://0.0.0.0:8080

# use cloud-deployed app
py.test --host http://ROUTE-HERE

Scaling the architecture

For production, I would suggest splitting training and serving into seperate applications, and incorporating a fascade API. Also it would be best to use a shared cache such as Redis or Pivotal Cloud Cache to allow other applications and multiple instances of the pipeline to access the trained model. Here is a potential architecture.

A scalable model training and model serving architecture.

Author

Chris Rawles

Deploy AutoML as a service using Flask

Related tags

Overview

AutoML Service

Architecture

Using the app

Deploying

Usage

Running the tests

Scaling the architecture

Author

Owner

Chris Rawles

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

A concept I came up which ditches the idea of "layers" in a neural network.

AutoX是一个高效的自动化机器学习工具，它主要针对于表格类型的数据挖掘竞赛。它的特点包括: 效果出色、简单易用、通用、自动化、灵活。

Machine-Learning with python (jupyter)

Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining

Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

A repository of PyBullet utility functions for robotic motion planning, manipulation planning, and task and motion planning

pure-predict: Machine learning prediction in pure Python

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

A Python package for time series classification

A pure-python implementation of the UpSet suite of visualisation methods by Lex, Gehlenborg et al.

Cryptocurrency price prediction and exceptions in python

Basic Docker Compose for Machine Learning Purposes

A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

All-in-one web-based development environment for machine learning

Napari sklearn decomposition

Retrieve annotated intron sequences and classify them as minor (U12-type) or major (U2-type)

Nixtla is an open-source time series forecasting library.

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly.

Deploy AutoML as a service using Flask

Related tags

Overview

AutoML Service

Architecture

Using the app

Deploying

Usage

Running the tests

Scaling the architecture

Author

Owner

Chris Rawles

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

A concept I came up which ditches the idea of "layers" in a neural network.

AutoX是一个高效的自动化机器学习工具，它主要针对于表格类型的数据挖掘竞赛。 它的特点包括: 效果出色、简单易用、通用、自动化、灵活。

Machine-Learning with python (jupyter)

Tutorials, examples, collections, and everything else that falls into the categories: pattern classification, machine learning, and data mining

Contains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.

AutoTabular automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications.

A repository of PyBullet utility functions for robotic motion planning, manipulation planning, and task and motion planning

pure-predict: Machine learning prediction in pure Python

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

A Python package for time series classification

A pure-python implementation of the UpSet suite of visualisation methods by Lex, Gehlenborg et al.

Cryptocurrency price prediction and exceptions in python

Basic Docker Compose for Machine Learning Purposes

A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

All-in-one web-based development environment for machine learning

Napari sklearn decomposition

Retrieve annotated intron sequences and classify them as minor (U12-type) or major (U2-type)

Nixtla is an open-source time series forecasting library.

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly.

AutoX是一个高效的自动化机器学习工具，它主要针对于表格类型的数据挖掘竞赛。它的特点包括: 效果出色、简单易用、通用、自动化、灵活。