Efficient Deep Learning Systems

This repository contains materials for the Efficient Deep Learning Systems course taught at the Faculty of Computer Science of HSE University and Yandex School of Data Analysis.

Syllabus

Week 1: Introduction
- Lecture: Course overview and organizational details. Core concepts of the GPU architecture and CUDA API.
- Seminar: CUDA operations in PyTorch. Introduction to benchmarking.
Week 2: Basics of distributed ML
- Lecture: Introduction to distributed training. Process-based communication. Parameter Server architecture.
- Seminar: Multiprocessing basics. Parallel GloVe training.
Week 3: Data-parallel training and All-Reduce
- Lecture: Data-parallel training of neural networks. All-Reduce and its efficient implementations.
- Seminar: Introduction to PyTorch Distributed. Data-parallel training primitives.
Week 4: Memory-efficient and model-parallel training
Week 5: Profiling DL code, training-time optimizations
Week 6: Basics of Python application deployment
Week 7: Software for serving neural networks
Week 8: Optimizing models for faster inference
Week 9: Experiment tracking, model and data versioning
Week 10: Testing, debugging and monitoring of models

Grading

There will be a total of 4 home assignments (some of them spread over several weeks). The final grade is a weighted sum of per-assignment grades. Please refer to the course page of your institution for details.

Efficient Deep Learning Systems course

Related tags

Overview

Efficient Deep Learning Systems

Syllabus

Grading

Staff

Owner

Max Ryabinin

Council-GAN - Implementation for our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020)

P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks

Code for the paper "Reinforcement Learning as One Big Sequence Modeling Problem"

GLANet - The code for Global and Local Alignment Networks for Unpaired Image-to-Image Translation arxiv

Github for the conference paper GLOD-Gaussian Likelihood OOD detector

EGNN - Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

Pytorch Implementation of PointNet and PointNet++++

Fast and robust clustering of point clouds generated with a Velodyne sensor.

Research into Forex price prediction from price history using Deep Sequence Modeling with Stacked LSTMs.

Twin-deep neural network for semi-supervised learning of materials properties

Progressive Image Deraining Networks: A Better and Simpler Baseline

Algorithmic Trading using RNN

PyTorch implementation of Neural Combinatorial Optimization with Reinforcement Learning.

A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising (CVPR 2020 Oral & TPAMI 2021)

Implementation of: "Exploring Randomly Wired Neural Networks for Image Recognition"

EfficientDet (Scalable and Efficient Object Detection) implementation in Keras and Tensorflow

MTA:SA Server Configer.

TensorFlow implementation of "Variational Inference with Normalizing Flows"

Highly comparative time-series analysis

Code release for Universal Domain Adaptation(CVPR 2019)