Deploy AutoML as a service using Flask

Overview

AutoML Service

Deploy automated machine learning (AutoML) as a service using Flask, for both pipeline training and pipeline serving.

The framework implements a fully automated time series classification pipeline, automating both feature engineering and model selection and optimization using Python libraries, TPOT and tsfresh.

Check out the blog post for more info.

Resources:

  • TPOT– Automated feature preprocessing and model optimization tool
  • tsfresh– Automated time series feature engineering and selection
  • Flask– A web development microframework for Python

Architecture

The application exposes both model training and model predictions with a RESTful API. For model training, input data and labels are sent via POST request, a pipeline is trained, and model predictions are accessible via a prediction route.

Pipelines are stored to a unique key, and thus, live predictions can be made on the same data using different feature construction and modeling pipelines.

An automated pipeline for time-series classification.

The model training logic is exposed as a REST endpoint. Raw, labeled training data is uploaded via a POST request and an optimal model is developed.

Raw training data is uploaded via a POST request and a model prediction is returned.

Using the app

View the Jupyter Notebook for an example.

Deploying

# deploy locally
python automl_service.py
# deploy on cloud foundry
cf push

Usage

Train a pipeline:

train_url = 'http://0.0.0.0:8080/train_pipeline'
train_files = {'raw_data': open('data/data_train.json', 'rb'),
               'labels'  : open('data/label_train.json', 'rb'),
               'params'  : open('parameters/train_parameters_model2.yml', 'rb')}

# post request to train pipeline
r_train = requests.post(train_url, files=train_files)
result_df = json.loads(r_train.json())

returns:

{'featureEngParams': {'default_fc_parameters': "['median', 'minimum', 'standard_deviation', 
                                                 'sum_values', 'variance', 'maximum', 
                                                 'length', 'mean']",
                      'impute_function': 'impute',
                      ...},
 'mean_cv_accuracy': 0.865,
 'mean_cv_roc_auc': 0.932,
 'modelId': 1,
 'modelType': "Pipeline(steps=[('stackingestimator', StackingEstimator(estimator=LinearSVC(...))),
                               ('logisticregression', LogisticRegressionClassifier(solver='liblinear',...))])"
 'trainShape': [1647, 8],
 'trainTime': 1.953}

Serve pipeline predictions:

serve_url = 'http://0.0.0.0:8080/serve_prediction'
test_files = {'raw_data': open('data/data_test.json', 'rb'),
              'params' : open('parameters/test_parameters_model2.yml', 'rb')}

# post request to serve predictions from trained pipeline
r_test  = requests.post(serve_url, files=test_files)
result = pd.read_json(r_test.json()).set_index('id')
example_id prediction
1 0.853
2 0.991
3 0.060
4 0.995
5 0.003
... ...

View all trained models:

r = requests.get('http://0.0.0.0:8080/models')
pipelines = json.loads(r.json())
{'1':
    {'mean_cv_accuracy': 0.873,
     'modelType': "RandomForestClassifier(...),
     ...},
 '2':
    {'mean_cv_accuracy': 0.895,
     'modelType': "GradientBoostingClassifier(...),
     ...},
 '3':
    {'mean_cv_accuracy': 0.859,
     'modelType': "LogisticRegressionClassifier(...),
     ...},
...}

Running the tests

Supply a user argument for the host.

# use local app
py.test --host http://0.0.0.0:8080
# use cloud-deployed app
py.test --host http://ROUTE-HERE

Scaling the architecture

For production, I would suggest splitting training and serving into seperate applications, and incorporating a fascade API. Also it would be best to use a shared cache such as Redis or Pivotal Cloud Cache to allow other applications and multiple instances of the pipeline to access the trained model. Here is a potential architecture.

A scalable model training and model serving architecture.

Author

Chris Rawles

Owner
Chris Rawles
...
Chris Rawles
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning.

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported ha

Microsoft 1.1k Jan 04, 2023
A repository of PyBullet utility functions for robotic motion planning, manipulation planning, and task and motion planning

pybullet-planning (previously ss-pybullet) A repository of PyBullet utility functions for robotic motion planning, manipulation planning, and task and

Caelan Garrett 260 Dec 27, 2022
vortex particles for simulating smoke in 2d

vortex-particles-method-2d vortex particles for simulating smoke in 2d -vortexparticles_s

12 Aug 23, 2022
YouTube Spam Detection with python

YouTube Spam Detection This code deletes spam comment on youtube videos based on two characteristics (currently) If the author of the comment has a se

MohamadReza Taalebi 5 Sep 27, 2022
A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

Daniel Formoso 5.7k Dec 30, 2022
scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly.

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly. Its main purpose is the transformation of bilinear forms into sparse matrices and linear forms into vectors.

Tom Gustafsson 297 Dec 13, 2022
Accelerating model creation and evaluation.

EmeraldML A machine learning library for streamlining the process of (1) cleaning and splitting data, (2) training, optimizing, and testing various mo

Yusuf 0 Dec 06, 2021
A repository to index and organize the latest machine learning courses found on YouTube.

📺 ML YouTube Courses At DAIR.AI we ❤️ open education. We are excited to share some of the best and most recent machine learning courses available on

DAIR.AI 9.6k Jan 01, 2023
Falken provides developers with a service that allows them to train AI that can play their games

Falken provides developers with a service that allows them to train AI that can play their games. Unlike traditional RL frameworks that learn through rewards or batches of offline training, Falken is

Google Research 223 Jan 03, 2023
Open source time series library for Python

PyFlux PyFlux is an open source time series library for Python. The library has a good array of modern time series models, as well as a flexible array

Ross Taylor 2k Jan 02, 2023
fastFM: A Library for Factorization Machines

Citing fastFM The library fastFM is an academic project. The time and resources spent developing fastFM are therefore justified by the number of citat

1k Dec 24, 2022
TensorFlow implementation of an arbitrary order Factorization Machine

This is a TensorFlow implementation of an arbitrary order (=2) Factorization Machine based on paper Factorization Machines with libFM. It supports: d

Mikhail Trofimov 785 Dec 21, 2022
pandas, scikit-learn, xgboost and seaborn integration

pandas, scikit-learn and xgboost integration.

299 Dec 30, 2022
Uber Open Source 1.6k Dec 31, 2022
Anytime Learning At Macroscale

On Anytime Learning At Macroscale Learning from sequential data dumps (key) Requirements Python 3.7 Pytorch 1.9.0 Hydra 1.1.0 (pip install hydra-core

Meta Research 8 Mar 29, 2022
CobraML: Completely Customizable A python ML library designed to give the end user full control

CobraML: Completely Customizable What is it? CobraML is a python library built on both numpy and numba. Unlike other ML libraries CobraML gives the us

Sriram Govindan 14 Dec 19, 2021
MLFlow in a Dockercontainer based on Azurite and Postgres

mlflow-azurite-postgres docker This is a MLFLow image which works with a postgres DB and a local Azure Blob Storage Instance (Azurite). This image is

2 May 29, 2022
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Website | Documentation | Tutorials | Installation | Release Notes CatBoost is a machine learning method based on gradient boosting over decision tree

CatBoost 6.9k Jan 05, 2023
Responsible Machine Learning with Python

Examples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.

ph_ 624 Jan 06, 2023
Predict the output which should give a fair idea about the chances of admission for a student for a particular university

Predict the output which should give a fair idea about the chances of admission for a student for a particular university.

ArvindSandhu 1 Jan 11, 2022