Awesome Long-Tailed Learning

Overview

Awesome Long-Tailed Learning Awesome

This repo pays specially attention to the long-tailed distribution, where labels follow a long-tailed or power-law distribution in the training dataset or/and test dataset. Related papers are sumarized, including its application in computer vision, in particular image classification, and extreme multi-label learning (XML), in particular text categorization.

🔆 Updated 2021-09-27

Long-tailed Learning in Computer Vision

Type of Long-Tailed Learning Methods

Type TST IS CBS CLW NC ENS DA
Meaning Two-Stage Training Instance Sampling Class-Balanced Sampling Class-Level Weighting Normalized Classifier Ensemble Data Augmentation

Long-Tailed Learning Workshops

Year Venue Title Remark
2021 CVPR Open World Vision long-tail, open-set, streaming labels
2021 CVPR Learning from Limited and Imperfect Data (L2ID) label noise, SSL, long-tail

Long-Tailed Learning Papers

Year Venue Title Remark
2021 Arxiv LEARNING FROM LONG-TAILED DATA WITH NOISY LABELS
2021 ICCV Self Supervision to Distillation for Long-Tailed Visual Recognition
2021 ICCV Distilling Virtual Examples for Long-tailed Recognition
2021 CVPR Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification
2021 CVPR MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition
2021 CVPR Disentangling Label Distribution for Long-tailed Visual Recognition
2021 CVPR Long-Tailed Multi-Label Visual Recognition by Collaborative Training on Uniform and Re-Balanced Samplings
2021 CVPR Seesaw Loss for Long-Tailed Instance Segmentation
2021 ICLR IS LABEL SMOOTHING TRULY INCOMPATIBLE WITH KNOWLEDGE DISTILLATION: AN EMPIRICAL STUDY
2021 Arxiv Improving Long-Tailed Classification from Instance Level
2021 Arxiv DISTRIBUTION-AWARE SEMANTICS-ORIENTED PSEUDO-LABEL FOR IMBALANCED SEMI-SUPERVISED LEARNING SSL, Code
2021 Arxiv ResLT: Residual Learning for Long-tailed Recognition
2021 Arxiv Improving Long-Tailed Classification from Instance Level
2021 Arxiv Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces by Google
2021 Arxiv Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition
2021 Arxiv Procrustean Training for Imbalanced Deep Learning
2021 Arxiv Balanced Knowledge Distillation for Long-tailed Learning CBS+IS, Code
2021 Arxiv Class-Balanced Distillation for Long-Tailed Visual Recognition ENS+DA+IS, by Google Research
2021 Arxiv Distributional Robustness Loss for Long-tail Learning TST+CBS
2021 CVPR Improving Calibration for Long-Tailed Recognition DA+TST, Code
2021 CVPR Distribution Alignment: A Unified Framework for Long-tail Visual Recognition TST
2021 CVPR Adversarial Robustness under Long-Tailed Distribution
2021 CVPR CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning by Google, Code, Tensorflow
2021 ICLR HETEROSKEDASTIC AND IMBALANCED DEEP LEARNING WITH ADAPTIVE REGULARIZATION Code
2021 ICLR LONG-TAILED RECOGNITION BY ROUTING DIVERSE DISTRIBUTION-AWARE EXPERTS ENS+NC, Code, by Zi-Wei Liu
2021 ICLR Long-Tail Learning via Logit Adjustment by Google
2021 AAAI Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks
2021 Arxiv Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification
2020 Arxiv ELF: An Early-Exiting Framework for Long-Tailed Classification
2020 CVPR Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective
2020 CVPR Equalization Loss for Long-Tailed Object Recognition
2020 CVPR Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective
2020 ICLR Decoupling representation and classifier for long-tailed recognition Code
2020 NeurIPS Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning Code
2020 NeurIPS Rethinking the Value of Labels for Improving Class-Imbalanced Learning Code
2020 CVPR Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition Code
2019 NeurIPS Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss Code
2019 CVPR Large-Scale Long-Tailed Recognition in an Open World Code, bibtex, by CUHK
2018 - iNatrualist. The inaturalist 2018 competition dataset long-tailed dataset
2017 Arxiv The Devil is in the Tails: Fine-grained Classification in the Wild
2017 NeurIPS Learning to model the tail

eXtreme Multi-label Learning for Information Retrieval

Binary Relevance

Year Venue Title Remark
2019 Machine learning Data Scarcity, Robustness and Extreme Multi-label Classification
2019 WSDM Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches
2017 KDD PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification
2017 AISTATS Label Filters for Large Scale Multilabel Classification
2016 WSDM DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification
2016 ICML PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification

Tree-based Methods

Year Venue Title Remark
2021 KDD Extreme Multi-label Learning for Semantic Matching in Product Search by Amazon, code
2020 arXiv Probabilistic Label Trees for Extreme Multi-label Classification PLT survey, code
2020 arXiv Online probabilistic label trees
2020 AISTATS LdSM: Logarithm-depth Streaming Multi-label Decision Trees Instance tree,c++ code
2019 NeurIPS AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks Label tree
2019 arXiv Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification Label tree
2018 ICML CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning Instance tree
2018 WWW Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising Label tree...by Manik Varma
2016 ICML Extreme F-Measure Maximization using Sparse Probability Estimates Label tree
2016 KDD Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications Instance tree
2014 KDD A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning Instance tree, python implementation
2013 ICML Label Partitioning For Sublinear Ranking Label tree
2013 WWW Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages Instance tree, Random Forest, Gini Index
2011 NeurIPS Efficient label tree learning for large scale object recognition Label tree, multi-class
2010 NeurIPS Label embedding trees for large multi-class tasks Label tree, multi-class
2008 ECML Workshop Effective and Efficient Multilabel Classification in Domains with Large Number of Labels Label tree

Embedding-based Methods

Year Venue Title Remark
2019 AAAI Distributional Semantics Meets Multi-Label Learning bibtex
2019 arXiv Ranking-Based Autoencoder for Extreme Multi-label Classification
2019 NeurIPS Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Ouput Spaces by Google Research
2017 KDD AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification
2015 NeurIPS Sparse Local Embeddings for Extreme Multi-label Classification
2014 ICML Large-scale Multi-label Learning with Missing Labels
2014 ICML Multi-label Classification via Feature-aware Implicit Label Space Encoding
2013 ICML Efficient Multi-label Classification with Many Labels
2012 NeurIIPS Feature-aware Label Space Dimension Reduction for Multi-label Classification
2011 IJCAI WSABIE: Scaling Up To Large Vocabulary Image Annotation bibtex
2009 NeurIPS Multi-Label Prediction via Compressed Sensing
2008 KDD Extracting Shared Subspaces for Multi-label Classification

Speed-up and Compression

Year Venue Title Remark
2020 KDD Large-Scale Training System for 100-Million Classification at Alibaba Applied Data Science Track
2020 arXiv SOLAR: Sparse Orthogonal Learned and Random Embeddings
2020 ICLR EXTREME CLASSIFICATION VIA ADVERSARIAL SOFTMAX APPROXIMATION
2019 AISTATS Stochastic Negative Mining for Learning with Large Output Spaces by Google
2019 NeurIPS Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products Rice University, bibtex
2019 arXiv An Embarrassingly Simple Baseline for eXtreme Multi-label Prediction
2019 arXiv Accelerating Extreme Classification via Adaptive Feature Agglomeration bibtex, authors from IIT
2019 SDM Fast Training for Large-Scale One-versus-All Linear Classifiers using Tree-Structured Initialization code bibtex

Noval XML Settings

Year Venue Title Remark
2020 arXiv Extreme Multi-label Classification from Aggregated Labels by Inderjit Dhillon. This paper considers multi-instance learning in XML
2020 arXiv Unbiased Loss Functions for Extreme Classification With Missing Labels by Rohit Babbar. Missing labels
2020 ICML Deep Streaming Label Learning code, by Dacheng Tao, streaming multi-label learning
2016 arXiv Streaming Label Learning for Modeling Labels on the Fly by Dacheng Tao, streaming multi-label learning

Theoritical Studies

Year Venue Title Remark
2019 ICML Sparse Extreme Multi-label Learning with Oracle Property Code, by Weiwei Liu
2019 NeurIPS Multilabel reductions: what is my loss optimising? bibtex, by Google

Text Classification

Year Venue Title Remark
2021 ICML SiameseXML: Siamese Networks meet Extreme Classifiers with 100M Labels
2020 KDD Correlation Networks for Extreme Multi-label Text Classification code
2020 arXiv GNN-XML: Graph Neural Networks for Extreme Multi-label Text Classification
2020 ICML Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification code
2019 ACL Large-Scale Multi-Label Text Classification on EU Legislation Eur-Lex 4.3K, bibtex
2019 arXiv X-BERT: eXtreme Multi-label Text Classification with BERT code by Yiming Yang, Inderjit Dhillon
2019 NeurIPS AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks
2018 EMNLP Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces few-shot, zero-shot, evaluation metric
2018 NeurIPS A no-regret generalization of hierarchical softmax to extreme multi-label classification code, PLT code
2017 SIGIR Deep Learning for Extreme Multi-label Text Classification by Yiming Yang at CMU, bibtex

Others

Label Correlation

Year Venue Title Remark
2019 ICML DL2: Training and Querying Neural Networks with Logic
2015 KDD Discovering and Exploiting Deterministic Label Relationships in Multi-Label Learning
2010 KDD Multi-Label Learning by Exploiting Label Dependency

Long-tailed Continual Learning

Year Venue Title Remark
2020 ECCV Imbalanced Continual Learning with Partitioning Reservoir Sampling

Train/Test Split

Year Venue Title Remark
2021 Arxiv Stratified Sampling for Extreme Multi-Label Data

XML Seminar

Year Venue Title Remark
2019 Dagstuhl Seminar 18291 Extreme Classification

Survey References:

  1. https://arxiv.org/pdf/1901.00248.pdf
  2. http://www.iith.ac.in/~saketha/research/AkshatMTP2018.pdf
  3. http://manikvarma.org/pubs/bengio19.pdf
  4. The Emerging Trends of Multi-Label Learning

XML Datasets link

Extreme Classification Workshops link

Owner
Stomach_ache
Stomach_ache
Lava-DL, but with PyTorch-Lightning flavour

Deep learning project seed Use this seed to start new deep learning / ML projects. Built in setup.py Built in requirements Examples with MNIST Badges

Sami BARCHID 4 Oct 31, 2022
Codes for "Template-free Prompt Tuning for Few-shot NER".

EntLM The source codes for EntLM. Dependencies: Cuda 10.1, python 3.6.5 To install the required packages by following commands: $ pip3 install -r requ

77 Dec 27, 2022
Package for extracting emotions from social media text. Tailored for financial data.

EmTract: Extracting Emotions from Social Media Text Tailored for Financial Contexts EmTract is a tool that extracts emotions from social media text. I

13 Nov 17, 2022
A minimalist implementation of score-based diffusion model

sdeflow-light This is a minimalist codebase for training score-based diffusion models (supporting MNIST and CIFAR-10) used in the following paper "A V

Chin-Wei Huang 89 Dec 20, 2022
Video-Music Transformer

VMT Video-Music Transformer (VMT) is an attention-based multi-modal model, which generates piano music for a given video. Paper https://arxiv.org/abs/

Chin-Tung Lin 5 Jul 13, 2022
TensorFlow Implementation of Unsupervised Cross-Domain Image Generation

Domain Transfer Network (DTN) TensorFlow implementation of Unsupervised Cross-Domain Image Generation. Requirements Python 2.7 TensorFlow 0.12 Pickle

Yunjey Choi 864 Dec 30, 2022
Code for layerwise detection of linguistic anomaly paper (ACL 2021)

Layerwise Anomaly This repository contains the source code and data for our ACL 2021 paper: "How is BERT surprised? Layerwise detection of linguistic

6 Dec 07, 2022
A simplistic and efficient pure-python neural network library from Phys Whiz with CPU and GPU support.

A simplistic and efficient pure-python neural network library from Phys Whiz with CPU and GPU support.

Manas Sharma 19 Feb 28, 2022
Large scale embeddings on a single machine.

Marius Marius is a system under active development for training embeddings for large-scale graphs on a single machine. Training on large scale graphs

Marius 107 Jan 03, 2023
Neural Caption Generator with Attention

Neural Caption Generator with Attention Tensorflow implementation of "Show

Taeksoo Kim 510 Nov 30, 2022
On Out-of-distribution Detection with Energy-based Models

On Out-of-distribution Detection with Energy-based Models This repository contains the code for the experiments conducted in the paper On Out-of-distr

Sven 19 Aug 07, 2022
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

DLR-RM 4.7k Jan 01, 2023
When BERT Plays the Lottery, All Tickets Are Winning

When BERT Plays the Lottery, All Tickets Are Winning Large Transformer-based models were shown to be reducible to a smaller number of self-attention h

Sai 16 Nov 10, 2022
Learnable Boundary Guided Adversarial Training (ICCV2021)

Learnable Boundary Guided Adversarial Training This repository contains the implementation code for the ICCV2021 paper: Learnable Boundary Guided Adve

DV Lab 27 Sep 25, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Dec 31, 2022
Code base for reproducing results of I.Schubert, D.Driess, O.Oguz, and M.Toussaint: Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics. NeurIPS (2021)

Learning to Execute (L2E) Official code base for completely reproducing all results reported in I.Schubert, D.Driess, O.Oguz, and M.Toussaint: Learnin

3 May 18, 2022
Image Data Augmentation in Keras

Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset.

Grace Ugochi Nneji 3 Feb 15, 2022
An SE(3)-invariant autoencoder for generating the periodic structure of materials

Crystal Diffusion Variational AutoEncoder This software implementes Crystal Diffusion Variational AutoEncoder (CDVAE), which generates the periodic st

Tian Xie 94 Dec 10, 2022
Repository for the semantic WMI loss

Installation: pip install -e . Installing DL2: First clone DL2 in a separate directory and install it using the following commands: git clone https:/

Nick Hoernle 4 Sep 15, 2022
Realtime YOLO Monster Detection With Non Maximum Supression

Realtime-YOLO-Monster-Detection-With-Non-Maximum-Supression Table of Contents In

5 Oct 07, 2022