Paper and Code for "Curriculum Learning by Optimizing Learning Dynamics" (AISTATS 2021)

Related tags

DocumentationDoCL
Overview

Curriculum Learning by Optimizing Learning Dynamics (DoCL)

AISTATS 2021 paper:

Title: Curriculum Learning by Optimizing Learning Dynamics [pdf] [appendix] [slides]
Authors: Tianyi Zhou, Shengjie Wang, Jeff A. Bilmes
Institute: University of Washington, Seattle

@inproceedings{
    zhou2020docl,
    title={Curriculum Learning by Optimizing Learning Dynamics},
    author={Tianyi Zhou and Shengjie Wang and Jeff A. Bilmes},
    booktitle={Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (AISTATS)},
    year={2021},
}

Abstract
We study a novel curriculum learning scheme where in each round, samples are selected to achieve the greatest progress and fastest learning speed towards the ground-truth on all available samples. Inspired by an analysis of optimization dynamics under gradient flow for both regression and classification, the problem reduces to selecting training samples by a score computed from samples’ residual and linear temporal dynamics. It encourages the model to focus on the samples at learning frontier, i.e., those with large loss but fast learning speed. The scores in discrete time can be estimated via already-available byproducts of training, and thus require a negligible amount of extra computation. We discuss the properties and potential advantages of the proposed dynamics optimization via current deep learning theory and empirical study. By integrating it with cyclical training of neural networks, we introduce "dynamics-optimized curriculum learning (DoCL)", which selects the training set for each step by weighted sampling based on the scores. On nine different datasets, DoCL significantly outperforms random mini-batch SGD and recent curriculum learning methods both in terms of efficiency and final performance.

Usage

Prerequisites

Instructions

  • For now, we keep all the DoCL code in docl.py. It supports multiple datasets and models. You can add your own options.
  • Example scripts to run DoCL on CIFAR10/100 for training WideResNet-28-10 can be found in docl_cifar.sh.
  • We apply multiple episodes of training epochs, each following a cosine annealing learning rate decreasing from --lr_max to --lr_min. The episodes can be set by epoch numbers, for example, --epochs 300 --schedule 0 5 10 15 20 30 40 60 90 140 210 300.
  • DoCL reduces the selected subset's size over the training episodes, starting from n (the total number of training samples). Set how to reduce the size by --k 1.0 --dk 0.1 --mk 0.3 for example, which starts from a subset size (k * n) and multiplies it by (1 - dk) until reaching (mk * n).
  • To further reduce the subset in earlier epochs less than n and save more computation, add --use_centrality to further prune the DoCL-selected subset to a few diverse and representative samples according to samples' centrality (defined on pairwise similarity between samples). Set the corresponding selection ratio and how you want to change the ratio every episode, for example, --select_ratio 0.5 --select_ratio_rate 1.1 will further reduce the DoCL-selected subset to be its half size in the first non-warm-starting episode and then multiply this ratio by 1.1 for every future episode until selection_ratio = 1.
  • Centrality is an alternative of the facility location function in the paper in order to encourage diversity. The latter requires an external submodular maximization library and extra computation, compared to the centrality used here. We may add the option of submodular maximization in the future, but the centrality performs good enough on most tested tasks.
  • Self-supervised learning may help in some scenarios. Two types of self-supervision regularizations are supported, i.e., --consistency and --contrastive.
  • If one is interested to try DoCL on noisy-label learning (though not the focus of the paper), add --use_noisylabel and specify the noisy type and ratio using --label_noise_type and --label_noise_rate.

License
This project is licensed under the terms of the MIT license.

Owner
Tianyi Zhou
Tianyi Zhou
API spec validator and OpenAPI document generator for Python web frameworks.

API spec validator and OpenAPI document generator for Python web frameworks.

1001001 249 Dec 22, 2022
A markdown wiki and dashboarding system for Datasette

datasette-notebook A markdown wiki and dashboarding system for Datasette This is an experimental alpha and everything about it is likely to change. In

Simon Willison 19 Apr 20, 2022
Generate modern Python clients from OpenAPI

openapi-python-client Generate modern Python clients from OpenAPI 3.x documents. This generator does not support OpenAPI 2.x FKA Swagger. If you need

555 Jan 02, 2023
Pyoccur - Python package to operate on occurrences (duplicates) of elements in lists

pyoccur Python Occurrence Operations on Lists About Package A simple python package with 3 functions has_dup() get_dup() remove_dup() Currently the du

Ahamed Musthafa 6 Jan 07, 2023
OpenAPI Spec validator

OpenAPI Spec validator About OpenAPI Spec Validator is a Python library that validates OpenAPI Specs against the OpenAPI 2.0 (aka Swagger) and OpenAPI

A 241 Jan 05, 2023
Python-samples - This project is to help someone need some practices when learning python language

Python-samples - This project is to help someone need some practices when learning python language

Gui Chen 0 Feb 14, 2022
Mkdocs obsidian publish - Publish your obsidian vault through a python script

Mkdocs Obsidian Mkdocs Obsidian is an association between a python script and a

Mara 49 Jan 09, 2023
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, A

Donne Martin 24.5k Jan 09, 2023
Toolchain for project structure and documents optimisation

ritocco Toolchain for project structure and documents optimisation

Harvey Wu 1 Jan 12, 2022
Software engineering course project. Secondhand trading system.

PigeonSale Software engineering course project. Secondhand trading system. Documentation API doumenatation: list of APIs Backend documentation: notes

Harry Lee 1 Sep 01, 2022
A web app builds using streamlit API with python backend to analyze and pick insides from multiple data formats.

Data-Analysis-Web-App Data Analysis Web App can analysis data in multiple formates(csv, txt, xls, xlsx, ods, odt) and gives shows you the analysis in

Kumar Saksham 19 Dec 09, 2022
A next-generation curated knowledge sharing platform for data scientists and other technical professions.

Knowledge Repo The Knowledge Repo project is focused on facilitating the sharing of knowledge between data scientists and other technical roles using

Airbnb 5.2k Dec 27, 2022
Loudchecker - Python script to check files for earrape

loudchecker python script to check files for earrape automatically installs depe

1 Jan 22, 2022
Swagger Documentation Generator for Django REST Framework: deprecated

Django REST Swagger: deprecated (2019-06-04) This project is no longer being maintained. Please consider drf-yasg as an alternative/successor. I haven

Marc Gibbons 2.6k Jan 03, 2023
🏆 A ranked list of awesome python developer tools and libraries. Updated weekly.

Best-of Python Developer Tools 🏆 A ranked list of awesome python developer tools and libraries. Updated weekly. This curated list contains 250 awesom

Machine Learning Tooling 646 Jan 07, 2023
A system for Python that generates static type annotations by collecting runtime types

MonkeyType MonkeyType collects runtime types of function arguments and return values, and can automatically generate stub files or even add draft type

Instagram 4.1k Jan 07, 2023
An introduction course for Python provided by VetsInTech

Introduction to Python This is an introduction course for Python provided by VetsInTech. For every "boot camp", there usually is a pre-req, but becaus

Vets In Tech 2 Dec 02, 2021
sphinx builder that outputs markdown files.

sphinx-markdown-builder sphinx builder that outputs markdown files Please ★ this repo if you found it useful ★ ★ ★ If you want frontmatter support ple

Clay Risser 144 Jan 06, 2023
An ongoing curated list of OS X best applications, libraries, frameworks and tools to help developers set up their macOS Laptop.

macOS Development Setup Welcome to MacOS Local Development & Setup. An ongoing curated list of OS X best applications, libraries, frameworks and tools

Paul Veillard 3 Apr 03, 2022
Seamlessly integrate pydantic models in your Sphinx documentation.

Seamlessly integrate pydantic models in your Sphinx documentation.

Franz Wöllert 71 Dec 26, 2022