MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

Overview

The collaboration platform for Machine Learning

MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.


MLReef

MLReef is a ML/DL development platform containing four main sections:

  • Data-Management - Fully versioned data hosting and processing infrastructure
  • Publishing code repositories - Containerized and versioned script repositories for immutable use in data pipelines
  • Experiment Manager - Experiment tracking, environments and results
  • ML-Ops - Pipelines & Orchestration solution for ML/DL jobs (K8s / Cloud / bare-metal)


To find out more about how MLReef can streamline your Machine Learning Development Lifecycle visit our homepage

Data Management

  • Host your data using git / git LFS repositories.
    • Work concurrently on data
    • Fully versioned or LFS version control
    • Full view on data processing and visualization history
  • Connect your external storage to MLReef and use your data directly in pipelines
  • Data set management (access, history, pipelines)

Publishing Code

Adding only parameter annotations to your code...

# example of parameter annotation for a image crop function
 @data_processor(
        name="Resnet50",
        author="MLReef",
        command="resnet50",
        type="ALGORITHM",
        description="CNN Model resnet50",
        visibility="PUBLIC",
        input_type="IMAGE",
        output_type="MODEL"
    )
    @parameter(name='input-path', type='str', required=True, defaultValue='train', description="input path")
    @parameter(name='output-path', type='str', required=True, defaultValue='output', description="output path")
    @parameter(name='height', type='int', required=True, defaultValue=224, description="height of cropped images in px")
    @parameter(name='width', type='int', required=True, defaultValue=224, description="width of cropped images in px")
    def init_params():
        pass

...and publishing your scripts gets you the following:

  • Containerization of your scripts
    • Always working scripts including easy hyperparameter access in pipelines
    • Execution environment (including specific packages & versions)
    • Hyper-parameters
      • ArgParser for command line parameters with currently used values
      • Explicit parameters dictionary
      • Input validation and guides
  • Multiple containers based on version and code branches

Experiment Manager

  • Complete experiment setup log
    • Full source control info including non-committed local changes
    • Execution environment (including specific packages & versions)
    • Hyper-parameters
  • Full experiment output automatic capture
    • Artifacts storage and standard-output logs
    • Performance metrics on individual experiments and comparative graphs for all experiments
    • Detailed view on logs and outputs generated
  • Extensive platform support and integrations

ML-Ops

  • Concurrent computing pipelining
  • Governance and control
    • Access and user management
    • Single permission management
    • Resource management
  • Model management

MLReef Architecture

The MLReef ML components within the ML life cycle:

  • Data Storage components based currently on Git and Git LFS.
  • Model development based on working modules (published by the community or your team), data management, data processing / data visualization / experiment pipeline on hosted or on-prem and model management.
  • ML-Ops orchestration, experiment and workflow reproducibility, and scalability.

Why MLReef?

MLReef is our solution to a problem we share with countless other researchers and developers in the machine learning/deep learning universe: Training production-grade deep learning models is a tangled process. MLReef tracks and controls the process by associating code version control, research projects, performance metrics, and model provenance.

We designed MLReef on best data science practices combined with the knowleged gained from DevOps and a deep focus on collaboration.

  • Use it on a daily basis to boost collaboration and visibility in your team
  • Create a job in the cloud from any code repository with a click of a button
  • Automate processes and create pipelines to collect your experimentation logs, outputs, and data
  • Make you ML life cycle transparent by cataloging it all on the MLReef platform

Getting Started as a Developer

To start developing, continue with the developer guide

Canonical source

The canonical source of MLReef where all development takes place is hosted on gitLab.com/mlreef/mlreef.

License

MIT License (see the License for more information)

Documentation, Community and Support

More information in the official documentation and on Youtube.

For examples and use cases, check these use cases or start the tutorial after registring:

If you have any questions: post on our Slack channel, or tag your questions on stackoverflow with 'mlreef' tag.

For feature requests or bug reports, please use GitLab issues.

Additionally, you can always reach out to us via [email protected]

Contributing

Merge Requests are always welcomed ❤️ See more details in the MLReef Contribution Guidelines.

Owner
MLReef
Your entire Machine Learning life cycle in one platform.
MLReef
Applied Machine Learning for Graduate Program in Computer Science (PPGCC)

Applied Machine Learning for Graduate Program in Computer Science (PPGCC) - Federal University of Santa Catarina

Jônatas Negri Grandini 1 Dec 22, 2021
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.

pmdarima Pmdarima (originally pyramid-arima, for the anagram of 'py' + 'arima') is a statistical library designed to fill the void in Python's time se

alkaline-ml 1.3k Dec 22, 2022
Pandas Machine Learning and Quant Finance Library Collection

Pandas Machine Learning and Quant Finance Library Collection

148 Dec 07, 2022
Implementation of K-Nearest Neighbors Algorithm Using PySpark

KNN With Spark Implementation of KNN using PySpark. The KNN was used on two separate datasets (https://archive.ics.uci.edu/ml/datasets/iris and https:

Zachary Petroff 4 Dec 30, 2022
Adaptive: parallel active learning of mathematical functions

adaptive Adaptive: parallel active learning of mathematical functions. adaptive is an open-source Python library designed to make adaptive parallel fu

741 Dec 27, 2022
Machine-learning-dell - Repositório com as atividades desenvolvidas no curso de Machine Learning

📚 Descrição Neste curso da Dell aprofundamos nossos conhecimentos em Machine Learning. 🖥️ Aulas (Em curso) 1.1 - Python aplicado a Data Science 1.2

Claudia dos Anjos 1 Jan 05, 2022
LinearRegression2 Tvads and CarSales

LinearRegression2_Tvads_and_CarSales This project infers the insight that how the TV ads for cars and car Sales are being linked with each other. It i

Ashish Kumar Yadav 1 Dec 29, 2021
A Lucid Framework for Transparent and Interpretable Machine Learning Models.

Currently a Beta-Version lucidmode is an open-source, low-code and lightweight Python framework for transparent and interpretable machine learning mod

lucidmode 15 Aug 12, 2022
Xeasy-ml is a packaged machine learning framework.

xeasy-ml 1. What is xeasy-ml Xeasy-ml is a packaged machine learning framework. It allows a beginner to quickly build a machine learning model and use

9 Mar 14, 2022
BASTA: The BAyesian STellar Algorithm

BASTA: BAyesian STellar Algorithm Current stable version: v1.0 Important note: BASTA is developed for Python 3.8, but Python 3.7 should work as well.

BASTA team 16 Nov 15, 2022
Napari sklearn decomposition

napari-sklearn-decomposition A simple plugin to use with napari This napari plug

1 Sep 01, 2022
To-Be is a machine learning challenge on CodaLab Platform about Mortality Prediction

To-Be is a machine learning challenge on CodaLab Platform about Mortality Prediction. The challenge aims to adress the problems of medical imbalanced data classification.

Marwan Mashra 1 Jan 31, 2022
Learn Machine Learning Algorithms by doing projects in Python and R Programming Language

Learn Machine Learning Algorithms by doing projects in Python and R Programming Language. This repo covers all aspect of Machine Learning Algorithms.

Ravi Chaubey 6 Oct 20, 2022
PLUR is a collection of source code datasets suitable for graph-based machine learning.

PLUR (Programming-Language Understanding and Repair) is a collection of source code datasets suitable for graph-based machine learning. We provide scripts for downloading, processing, and loading the

Google Research 76 Nov 25, 2022
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Dec 29, 2022
ML-powered Loan-Marketer Customer Filtering Engine

In Loan-Marketing business employees are required to call the user's to buy loans of several fields and in several magnitudes. If employees are calling everybody in the network it is also very length

Sagnik Roy 13 Jul 02, 2022
Nevergrad - A gradient-free optimization platform

Nevergrad - A gradient-free optimization platform nevergrad is a Python 3.6+ library. It can be installed with: pip install nevergrad More installati

Meta Research 3.4k Jan 08, 2023
Responsible AI Workshop: a series of tutorials & walkthroughs to illustrate how put responsible AI into practice

Responsible AI Workshop Responsible innovation is top of mind. As such, the tech industry as well as a growing number of organizations of all kinds in

Microsoft 9 Sep 14, 2022
Python bindings for MPI

MPI for Python Overview Welcome to MPI for Python. This package provides Python bindings for the Message Passing Interface (MPI) standard. It is imple

MPI for Python 604 Dec 29, 2022
Primitives for machine learning and data science.

An Open Source Project from the Data to AI Lab, at MIT MLPrimitives Pipelines and primitives for machine learning and data science. Documentation: htt

MLBazaar 65 Dec 29, 2022