Repository for DCA0305, an undergraduate course about Machine Learning Workflows and Pipelines

Related tags

Machine Learningmlops
Overview

Federal University of Rio Grande do Norte

Technology Center

Department of Computer Engineering and Automation

Machine Learning Based Systems Design

References

  • 📚 Noah Gift, Alfredo Deza. Practical MLOps: Operationalizing Machine Learning Models [Link]
  • 📚 Chip Huyen. Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications. [Link]
  • 📚 Hannes Hapke, Catherine Nelson. Building Machine Learning Pipelines. [Link]
  • 📚 Mariano Anaya. Clean Code in Python [Link]
  • 📚 Aurélien Géron. Hands on Machine Learning with Scikit-Learn, Keras and TensorFlow. [Link]
  • 🤜 Dataquest Academic Program [Link]
  • 😃 CS329S - ML Systems Design [Link]
  • 🎯 Machine Learning Operations [Link]

Lessons

Week 01: Course Outline Open in PDF

  • Git and Version Control Open in Dataquest
    • You'll learn how to: a) organize your code using version control, b) resolve conflicts in version control, c) employ Git and Github to collaborate with others.
    • 👊 U1T1: guided project + getting a git repository.

Week 02: CLI fundamentals

  • Elements of the Command Line Open in Dataquest
    • You'll learn how to: a) employ the command line for Data Science, b) modify the behavior of commands with options, c) employ glob patterns and wildcards, d) define Important command line concepts, e) navigate he filesystem, f) manage users and permissions.
  • Text Processing in the Command Line Open in Dataquest
    • You'll learn how to: a) read and explore documentation, b) perform basic text processing, c) redirect and pipe output, d) inspect files, e) define different kinds of output, f) employ streams and file descriptors.
  • 🔠 U1T2: working with command line.

Week 03 - Clean Code Principles for Data Science and Machine Learning Open in PDF

  • Outline Open in Loom
  • Coding Best Practices Open in Loom
  • Writing Clean Code Open in Loom
  • Refactoring Code Open in Loom
  • Efficient Code Open in Loom
  • Documentation Open in Loom
  • Python Code Quality Authority (PCQA) - pycodestyle Open in Loom
  • PCQA - pylint Open in Loom
  • PCQA - autopep8 Open in Loom
  • PCQA - nbQA Open in Loom
  • ▶️ Hands on
    • 💾 Datasets [Link]
    • Writting Clean Code Jupyter
    • Exercise 01 Jupyter
    • Exercise 02 Jupyter
    • Exercise 03 Jupyter
    • Using pycodestyle Jupyter
    • Using pylint - script Python refactored script Python
    • Functions: Advanced - Best practices for writing functions Open in Dataquest

Week 04 Production Ready Code Open in PDF

  • Outline Open in Loom
  • Catching Errors Open in Loom
  • Testing and Data Science Open in Loom
  • A brief introduction about pytest Open in Loom
  • Logging Open in Loom
  • Case study: testing and logging Open in Loom
  • Model Drift Open in Loom
  • Hands on
    • Production ready code Jupyter
    • Data Visualization Fundamentals Open in Dataquest
      • You will learn how to: a) how to use data visualization to explore data and b) how and when to use the most common plots.
    • Storytelling Data Visualization and Information Design Open in Dataquest
      • You will learn how to: a) Create graphs using information design principles, b) create narrative data visualizations using Matplotlib, c) create visual patterns using Gestalt principles, d) control attention using pre-attentive attributes and e) employ Matplotlib's built-in styles.
Owner
Ivanovitch Silva
I'm an experimenter by design, and very interested in technologies related to Data Science & Machine Learning, Vehicles and Complex Networks.
Ivanovitch Silva
Evaluate on three different ML model for feature selection using Breast cancer data.

Anomaly-detection-Feature-Selection Evaluate on three different ML model for feature selection using Breast cancer data. ML models: SVM, KNN and MLP.

Tarek idrees 1 Mar 17, 2022
Apple-voice-recognition - Machine Learning

Apple-voice-recognition Machine Learning How does Siri work? Siri is based on large-scale Machine Learning systems that employ many aspects of data sc

Harshith VH 1 Oct 22, 2021
Probabilistic time series modeling in Python

GluonTS - Probabilistic Time Series Modeling in Python GluonTS is a Python toolkit for probabilistic time series modeling, built around Apache MXNet (

Amazon Web Services - Labs 3.3k Jan 03, 2023
Coursera Machine Learning - Python code

Coursera Machine Learning This repository contains python implementations of certain exercises from the course by Andrew Ng. For a number of assignmen

Jordi Warmenhoven 859 Dec 10, 2022
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models. Solve a variety of tasks with pre-trained models or finetune them in

Backprop 227 Dec 10, 2022
BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python

BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python. Some of the algorithms included are mor

Jared M. Smith 40 Aug 26, 2022
Titanic Traveller Survivability Prediction

The aim of the mini project is predict whether or not a passenger survived based on attributes such as their age, sex, passenger class, where they embarked and more.

John Phillip 0 Jan 20, 2022
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models

Highly interpretable, sklearn-compatible classifier based on decision rules This is a scikit-learn compatible wrapper for the Bayesian Rule List class

Tamas Madl 482 Nov 19, 2022
CVXPY is a Python-embedded modeling language for convex optimization problems.

CVXPY The CVXPY documentation is at cvxpy.org. We are building a CVXPY community on Discord. Join the conversation! For issues and long-form discussio

4.3k Jan 08, 2023
This handbook accompanies the course: Machine Learning with Hung-Yi Lee

This handbook accompanies the course: Machine Learning with Hung-Yi Lee

RenChu Wang 472 Dec 31, 2022
EbookMLCB - ebook Machine Learning cơ bản

Mã nguồn cuốn ebook "Machine Learning cơ bản", Vũ Hữu Tiệp. ebook Machine Learning cơ bản pdf-black_white, pdf-color. Mọi hình thức sao chép, in ấn đề

943 Jan 02, 2023
OptaPy is an AI constraint solver for Python to optimize planning and scheduling problems.

OptaPy is an AI constraint solver for Python to optimize the Vehicle Routing Problem, Employee Rostering, Maintenance Scheduling, Task Assignment, School Timetabling, Cloud Optimization, Conference S

OptaPy 208 Dec 27, 2022
Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

By Investors, For Investors. Want to read this in Chinese? Click here Empyrial is a Python-based open-source quantitative investment library dedicated

Santosh 640 Dec 31, 2022
ArviZ is a Python package for exploratory analysis of Bayesian models

ArviZ (pronounced "AR-vees") is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, model checking, comparison and diagnostics

ArviZ 1.3k Jan 05, 2023
XManager: A framework for managing machine learning experiments 🧑‍🔬

XManager is a platform for packaging, running and keeping track of machine learning experiments. It currently enables one to launch experiments locally or on Google Cloud Platform (GCP). Interaction

DeepMind 620 Dec 27, 2022
A project based example of Data pipelines, ML workflow management, API endpoints and Monitoring.

MLOps template with examples for Data pipelines, ML workflow management, API development and Monitoring.

Utsav 33 Dec 03, 2022
Official code for HH-VAEM

HH-VAEM This repository contains the official Pytorch implementation of the Hierarchical Hamiltonian VAE for Mixed-type Data (HH-VAEM) model and the s

Ignacio Peis 8 Nov 30, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 03, 2023
Predicting Keystrokes using an Audio Side-Channel Attack and Machine Learning

Predicting Keystrokes using an Audio Side-Channel Attack and Machine Learning My

3 Apr 10, 2022
Lightning ⚡️ fast forecasting with statistical and econometric models.

Nixtla Statistical ⚡️ Forecast Lightning fast forecasting with statistical and econometric models StatsForecast offers a collection of widely used uni

Nixtla 2.1k Dec 29, 2022