Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

Overview

TailCalibX : Feature Generation for Long-tail Classification

by Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi

[arXiv] [Code] [pip Package] [Video] TailCalibX methodology

Table of contents

🐣 Easy Usage (Recommended way to use our method)

⚠ Caution: TailCalibX is just TailCalib employed multiple times. Specifically, we generate a set of features once every epoch and use them to train the classifier. In order to mimic that, three things must be done at every epoch in the following order:

  1. Collect all the features from your dataloader.
  2. Use the tailcalib package to make the features balanced by generating samples.
  3. Train the classifier.
  4. Repeat.

πŸ’» Installation

Use the package manager pip to install tailcalib.

pip install tailcalib

πŸ‘¨β€πŸ’» Example Code

Check the instruction here for a much more detailed python package information.

# Import
from tailcalib import tailcalib

# Initialize
a = tailcalib(base_engine="numpy")   # Options: "numpy", "pytorch"

# Imbalanced random fake data
import numpy as np
X = np.random.rand(200,100)
y = np.random.randint(0,10, (200,))

# Balancing the data using "tailcalib"
feat, lab, gen = a.generate(X=X, y=y)

# Output comparison
print(f"Before: {np.unique(y, return_counts=True)}")
print(f"After: {np.unique(lab, return_counts=True)}")

πŸ§ͺ Advanced Usage

βœ” Things to do before you run the code from this repo

  • Change the data_root for your dataset in main.py.
  • If you are using wandb logging (Weights & Biases), make sure to change the wandb.init in main.py accordingly.

πŸ“€ How to use?

  • For just the methods proposed in this paper :
    • For CIFAR100-LT: run_TailCalibX_CIFAR100-LT.sh
    • For mini-ImageNet-LT : run_TailCalibX_mini-ImageNet-LT.sh
  • For all the results show in the paper :
    • For CIFAR100-LT: run_all_CIFAR100-LT.sh
    • For mini-ImageNet-LT : run_all_mini-ImageNet-LT.sh

πŸ“š How to create the mini-ImageNet-LT dataset?

Check Notebooks/Create_mini-ImageNet-LT.ipynb for the script that generates the mini-ImageNet-LT dataset with varying imbalance ratios and train-test-val splits.

βš™ Arguments

  • --seed : Select seed for fixing it.

    • Default : 1
  • --gpu : Select the GPUs to be used.

    • Default : "0,1,2,3"
  • --experiment: Experiment number (Check 'libs/utils/experiment_maker.py').

    • Default : 0.1
  • --dataset : Dataset number.

    • Choices : 0 - CIFAR100, 1 - mini-imagenet
    • Default : 0
  • --imbalance : Select Imbalance factor.

    • Choices : 0: 1, 1: 100, 2: 50, 3: 10
    • Default : 1
  • --type_of_val : Choose which dataset split to use.

    • Choices: "vt": val_from_test, "vtr": val_from_train, "vit": val_is_test
    • Default : "vit"
  • --cv1 to --cv9 : Custom variable to use in experiments - purpose changes according to the experiment.

    • Default : "1"
  • --train : Run training sequence

    • Default : False
  • --generate : Run generation sequence

    • Default : False
  • --retraining : Run retraining sequence

    • Default : False
  • --resume : Will resume from the 'latest_model_checkpoint.pth' and wandb if applicable.

    • Default : False
  • --save_features : Collect feature representations.

    • Default : False
  • --save_features_phase : Dataset split of representations to collect.

    • Choices : "train", "val", "test"
    • Default : "train"
  • --config : If you have a yaml file with appropriate config, provide the path here. Will override the 'experiment_maker'.

    • Default : None

πŸ‹οΈβ€β™‚οΈ Trained weights

Experiment CIFAR100-LT (ResNet32, seed 1, Imb 100) mini-ImageNet-LT (ResNeXt50)
TailCalib Git-LFS Git-LFS
TailCalibX Git-LFS Git-LFS
CBD + TailCalibX Git-LFS Git-LFS

πŸͺ€ Results on a Toy Dataset

Open In Colab

The higher the Imb ratio, the more imbalanced the dataset is. Imb ratio = maximum_sample_count / minimum_sample_count.

Check this notebook to play with the toy example from which the plot below was generated.

🌴 Directory Tree

TailCalibX
β”œβ”€β”€ libs
β”‚   β”œβ”€β”€ core
β”‚   β”‚   β”œβ”€β”€ ce.py
β”‚   β”‚   β”œβ”€β”€ core_base.py
β”‚   β”‚   β”œβ”€β”€ ecbd.py
β”‚   β”‚   β”œβ”€β”€ modals.py
β”‚   β”‚   β”œβ”€β”€ TailCalib.py
β”‚   β”‚   └── TailCalibX.py
β”‚   β”œβ”€β”€ data
β”‚   β”‚   β”œβ”€β”€ dataloader.py
β”‚   β”‚   β”œβ”€β”€ ImbalanceCIFAR.py
β”‚   β”‚   └── mini-imagenet
β”‚   β”‚       β”œβ”€β”€ 0.01_test.txt
β”‚   β”‚       β”œβ”€β”€ 0.01_train.txt
β”‚   β”‚       └── 0.01_val.txt
β”‚   β”œβ”€β”€ loss
β”‚   β”‚   β”œβ”€β”€ CosineDistill.py
β”‚   β”‚   └── SoftmaxLoss.py
β”‚   β”œβ”€β”€ models
β”‚   β”‚   β”œβ”€β”€ CosineDotProductClassifier.py
β”‚   β”‚   β”œβ”€β”€ DotProductClassifier.py
β”‚   β”‚   β”œβ”€β”€ ecbd_converter.py
β”‚   β”‚   β”œβ”€β”€ ResNet32Feature.py
β”‚   β”‚   β”œβ”€β”€ ResNext50Feature.py
β”‚   β”‚   └── ResNextFeature.py
β”‚   β”œβ”€β”€ samplers
β”‚   β”‚   └── ClassAwareSampler.py
β”‚   └── utils
β”‚       β”œβ”€β”€ Default_config.yaml
β”‚       β”œβ”€β”€ experiments_maker.py
β”‚       β”œβ”€β”€ globals.py
β”‚       β”œβ”€β”€ logger.py
β”‚       └── utils.py
β”œβ”€β”€ LICENSE
β”œβ”€β”€ main.py
β”œβ”€β”€ Notebooks
β”‚   β”œβ”€β”€ Create_mini-ImageNet-LT.ipynb
β”‚   └── toy_example.ipynb
β”œβ”€β”€ readme_assets
β”‚   β”œβ”€β”€ method.svg
β”‚   └── toy_example_output.svg
β”œβ”€β”€ README.md
β”œβ”€β”€ run_all_CIFAR100-LT.sh
β”œβ”€β”€ run_all_mini-ImageNet-LT.sh
β”œβ”€β”€ run_TailCalibX_CIFAR100-LT.sh
└── run_TailCalibX_mini-imagenet-LT.sh

Ignored tailcalib_pip as it is for the tailcalib pip package.

πŸ“ƒ Citation

@inproceedings{rahul2021tailcalibX,
    title   = {{Feature Generation for Long-tail Classification}},
    author  = {Rahul Vigneswaran and Marc T. Law and Vineeth N. Balasubramanian and Makarand Tapaswi},
    booktitle = {ICVGIP},
    year = {2021}
}

πŸ‘ Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

❀ About me

Rahul Vigneswaran

✨ Extras

🐝 Long-tail buzz : If you are interested in deep learning research which involves long-tailed / imbalanced dataset, take a look at Long-tail buzz to learn about the recent trending papers in this field.

πŸ“ License

MIT

Owner
Rahul Vigneswaran
Rahul Vigneswaran
Self-Supervised Image Denoising via Iterative Data Refinement

Self-Supervised Image Denoising via Iterative Data Refinement Yi Zhang1, Dasong Li1, Ka Lung Law2, Xiaogang Wang1, Hongwei Qin2, Hongsheng Li1 1CUHK-S

Zhang Yi 72 Jan 01, 2023
Turning pixels into virtual points for multimodal 3D object detection.

Multimodal Virtual Point 3D Detection Turning pixels into virtual points for multimodal 3D object detection. Multimodal Virtual Point 3D Detection, Ti

Tianwei Yin 204 Jan 08, 2023
a reimplementation of LiteFlowNet in PyTorch that matches the official Caffe version

pytorch-liteflownet This is a personal reimplementation of LiteFlowNet [1] using PyTorch. Should you be making use of this work, please cite the paper

Simon Niklaus 365 Dec 31, 2022
Inferring Lexicographically-Ordered Rewards from Preferences

Inferring Lexicographically-Ordered Rewards from Preferences Code author: Alihan HΓΌyΓΌk ([e

Alihan HΓΌyΓΌk 1 Feb 13, 2022
Analysis code and Latex source of the manuscript describing the conditional permutation test of confounding bias in predictive modelling.

Git repositoty of the manuscript entitled Statistical quantification of confounding bias in predictive modelling by Tamas Spisak The manuscript descri

PNI - Predictive Neuroimaging Lab, University Hospital Essen, Germany 0 Nov 22, 2021
FB-tCNN for SSVEP Recognition

FB-tCNN for SSVEP Recognition Here are the codes of the tCNN and FB-tCNN in the paper "Filter Bank Convolutional Neural Network for Short Time-Window

Wenlong Ding 12 Dec 14, 2022
Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

349 Dec 08, 2022
This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Feedback Prize - Evaluating Student Writing This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing. The

Udbhav Bamba 41 Dec 14, 2022
UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Pre-trained (foundation) models across tasks (understanding, generation and translation), languages (100+ languages), and modalities (language, image, audio, vision + language, audio + language, etc.

Microsoft 7.6k Jan 01, 2023
Character-Input - Create a program that asks the user to enter their name and their age

Character-Input Create a program that asks the user to enter their name and thei

PyLaboratory 0 Feb 06, 2022
A new data augmentation method for extreme lighting conditions.

Random Shadows and Highlights This repo has the source code for the paper: Random Shadows and Highlights: A new data augmentation method for extreme l

Osama Mazhar 35 Nov 26, 2022
Training BERT with Compute/Time (Academic) Budget

Training BERT with Compute/Time (Academic) Budget This repository contains scripts for pre-training and finetuning BERT-like models with limited time

Intel Labs 263 Jan 07, 2023
πŸ€ Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

πŸ€ Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

xmu-xiaoma66 7.7k Jan 05, 2023
Malmo Collaborative AI Challenge - Team Pig Catcher

The Malmo Collaborative AI Challenge - Team Pig Catcher Approach The challenge involves 2 agents who can either cooperate or defect. The optimal polic

Kai Arulkumaran 66 Jun 29, 2022
Python library for analysis of time series data including dimensionality reduction, clustering, and Markov model estimation

deeptime Releases: Installation via conda recommended. conda install -c conda-forge deeptime pip install deeptime Documentation: deeptime-ml.github.io

495 Dec 28, 2022
A crossplatform menu bar application using mpv as DLNA Media Renderer.

Macast Chinese README A menu bar application using mpv as DLNA Media Renderer. Install MacOS || Windows || Debian Download link: Macast release latest

4.4k Jan 01, 2023
TorchOk - The toolkit for fast Deep Learning experiments in Computer Vision

TorchOk - The toolkit for fast Deep Learning experiments in Computer Vision

52 Dec 23, 2022
Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition

Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition Official implementation of the Efficient Conforme

Maxime Burchi 145 Dec 30, 2022
Intent parsing and slot filling in PyTorch with seq2seq + attention

PyTorch Seq2Seq Intent Parsing Reframing intent parsing as a human - machine translation task. Work in progress successor to torch-seq2seq-intent-pars

Sean Robertson 160 Jan 07, 2023
Vignette is a face tracking software for characters using osu!framework.

Vignette is a face tracking software for characters using osu!framework. Unlike most solutions, Vignette is: Made with osu!framework, the game framewo

Vignette 412 Dec 28, 2022