Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

Last update: Jan 02, 2023

Overview

TailCalibX : Feature Generation for Long-tail Classification

by Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi

🐣 Easy Usage (Recommended way to use our method)
- 💻 Installation
- 👨‍💻 Example Code
🧪 Advanced Usage
🏋️‍♂️ Trained weights
🪀 Results on a Toy Dataset
🌴 Directory Tree
📃 Citation
👁 Contributing
❤ About me
✨ Extras
📝 License

🐣 Easy Usage (Recommended way to use our method)

⚠ Caution: TailCalibX is just TailCalib employed multiple times. Specifically, we generate a set of features once every epoch and use them to train the classifier. In order to mimic that, three things must be done at every epoch in the following order:

Collect all the features from your dataloader.
Use the tailcalib package to make the features balanced by generating samples.
Train the classifier.
Repeat.

💻 Installation

Use the package manager pip to install tailcalib.

pip install tailcalib

👨‍💻 Example Code

Check the instruction here for a much more detailed python package information.

# Import
from tailcalib import tailcalib

# Initialize
a = tailcalib(base_engine="numpy")   # Options: "numpy", "pytorch"

# Imbalanced random fake data
import numpy as np
X = np.random.rand(200,100)
y = np.random.randint(0,10, (200,))

# Balancing the data using "tailcalib"
feat, lab, gen = a.generate(X=X, y=y)

# Output comparison
print(f"Before: {np.unique(y, return_counts=True)}")
print(f"After: {np.unique(lab, return_counts=True)}")

🧪 Advanced Usage

✔ Things to do before you run the code from this repo

Change the data_root for your dataset in main.py.
If you are using wandb logging (Weights & Biases), make sure to change the wandb.init in main.py accordingly.

📀 How to use?

For just the methods proposed in this paper :
- For CIFAR100-LT: run_TailCalibX_CIFAR100-LT.sh
- For mini-ImageNet-LT : run_TailCalibX_mini-ImageNet-LT.sh
For all the results show in the paper :
- For CIFAR100-LT: run_all_CIFAR100-LT.sh
- For mini-ImageNet-LT : run_all_mini-ImageNet-LT.sh

📚 How to create the mini-ImageNet-LT dataset?

Check Notebooks/Create_mini-ImageNet-LT.ipynb for the script that generates the mini-ImageNet-LT dataset with varying imbalance ratios and train-test-val splits.

⚙ Arguments

--seed : Select seed for fixing it.
- Default : 1
--gpu : Select the GPUs to be used.
- Default : "0,1,2,3"
--experiment: Experiment number (Check 'libs/utils/experiment_maker.py').
- Default : 0.1
--dataset : Dataset number.
- Choices : 0 - CIFAR100, 1 - mini-imagenet
- Default : 0
--imbalance : Select Imbalance factor.
- Choices : 0: 1, 1: 100, 2: 50, 3: 10
- Default : 1
--type_of_val : Choose which dataset split to use.
- Choices: "vt": val_from_test, "vtr": val_from_train, "vit": val_is_test
- Default : "vit"
--cv1 to --cv9 : Custom variable to use in experiments - purpose changes according to the experiment.
- Default : "1"
--train : Run training sequence
- Default : False
--generate : Run generation sequence
- Default : False
--retraining : Run retraining sequence
- Default : False
--resume : Will resume from the 'latest_model_checkpoint.pth' and wandb if applicable.
- Default : False
--save_features : Collect feature representations.
- Default : False
--save_features_phase : Dataset split of representations to collect.
- Choices : "train", "val", "test"
- Default : "train"
--config : If you have a yaml file with appropriate config, provide the path here. Will override the 'experiment_maker'.
- Default : None

🏋️‍♂️ Trained weights

Experiment	CIFAR100-LT (ResNet32, seed 1, Imb 100)	mini-ImageNet-LT (ResNeXt50)
TailCalib	Git-LFS	Git-LFS
TailCalibX	Git-LFS	Git-LFS
CBD + TailCalibX	Git-LFS	Git-LFS

🪀 Results on a Toy Dataset

The higher the Imb ratio, the more imbalanced the dataset is. Imb ratio = maximum_sample_count / minimum_sample_count.

Check this notebook to play with the toy example from which the plot below was generated.

🌴 Directory Tree

TailCalibX
├── libs
│   ├── core
│   │   ├── ce.py
│   │   ├── core_base.py
│   │   ├── ecbd.py
│   │   ├── modals.py
│   │   ├── TailCalib.py
│   │   └── TailCalibX.py
│   ├── data
│   │   ├── dataloader.py
│   │   ├── ImbalanceCIFAR.py
│   │   └── mini-imagenet
│   │       ├── 0.01_test.txt
│   │       ├── 0.01_train.txt
│   │       └── 0.01_val.txt
│   ├── loss
│   │   ├── CosineDistill.py
│   │   └── SoftmaxLoss.py
│   ├── models
│   │   ├── CosineDotProductClassifier.py
│   │   ├── DotProductClassifier.py
│   │   ├── ecbd_converter.py
│   │   ├── ResNet32Feature.py
│   │   ├── ResNext50Feature.py
│   │   └── ResNextFeature.py
│   ├── samplers
│   │   └── ClassAwareSampler.py
│   └── utils
│       ├── Default_config.yaml
│       ├── experiments_maker.py
│       ├── globals.py
│       ├── logger.py
│       └── utils.py
├── LICENSE
├── main.py
├── Notebooks
│   ├── Create_mini-ImageNet-LT.ipynb
│   └── toy_example.ipynb
├── readme_assets
│   ├── method.svg
│   └── toy_example_output.svg
├── README.md
├── run_all_CIFAR100-LT.sh
├── run_all_mini-ImageNet-LT.sh
├── run_TailCalibX_CIFAR100-LT.sh
└── run_TailCalibX_mini-imagenet-LT.sh

Ignored tailcalib_pip as it is for the tailcalib pip package.

📃 Citation

@inproceedings{rahul2021tailcalibX,
    title   = {{Feature Generation for Long-tail Classification}},
    author  = {Rahul Vigneswaran and Marc T. Law and Vineeth N. Balasubramanian and Makarand Tapaswi},
    booktitle = {ICVGIP},
    year = {2021}
}

👁 Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

❤ About me

Rahul Vigneswaran

✨ Extras

🐝 Long-tail buzz : If you are interested in deep learning research which involves long-tailed / imbalanced dataset, take a look at Long-tail buzz to learn about the recent trending papers in this field.

📝 License

MIT

Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

Related tags

Overview

TailCalibX : Feature Generation for Long-tail Classification

Table of contents

🐣 Easy Usage (Recommended way to use our method)

💻 Installation

👨‍💻 Example Code

🧪 Advanced Usage

✔ Things to do before you run the code from this repo

📀 How to use?

📚 How to create the mini-ImageNet-LT dataset?

⚙ Arguments

🏋️‍♂️ Trained weights

🪀 Results on a Toy Dataset

🌴 Directory Tree

📃 Citation

👁 Contributing

❤ About me

✨ Extras

📝 License

Owner

Rahul Vigneswaran

Official Pytorch implementation of ICLR 2018 paper Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge.

Generic U-Net Tensorflow implementation for image segmentation

Jarvis Project is a basic virtual assistant that uses TensorFlow for learning.

Image Fusion Transformer

An implementation of the proximal policy optimization algorithm

Watch faces morph into each other with StyleGAN 2, StyleGAN, and DCGAN!

Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces (ICCV 2021)

Range Image-based LiDAR Localization for Autonomous Vehicles Using Mesh Maps

Unofficial Tensorflow-Keras implementation of Fastformer based on paper [Fastformer: Additive Attention Can Be All You Need](https://arxiv.org/abs/2108.09084).

smc.covid is an R package related to the paper A sequential Monte Carlo approach to estimate a time varying reproduction number in infectious disease models: the COVID-19 case by Storvik et al

Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Pixel-wise segmentation on VOC2012 dataset using pytorch.

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

Learning Compatible Embeddings, ICCV 2021

Versatile Generative Language Model

FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack

Interactive Visualization to empower domain experts to align ML model behaviors with their knowledge.

Baseline powergrid model for NY