Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

Related tags

Deep LearningDisAlign
Overview

Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

[Paper][Code]

We implement the classification, object detection and instance segmentation tasks based on our cvpods. The users should install cvpods first and run the experiments in this repo.

Changelog

  • 4.23.2021 Update the DisAlign on LVIS v0.5(Mask R-CNN + Res50)
  • 4.12.2021 Update the README

0. How to Use

  • Step-1: Install the latest cvpods.
  • Step-2: cd cvpods
  • Step-3: Prepare dataset for different tasks.
  • Step-4: git clone https://github.com/Megvii-BaseDetection/DisAlign playground_disalign
  • Step-5: Enter one folder and run pods_train --num-gpus 8
  • Step-6: Use pods_test --num-gpus 8 to evaluate the last the checkpoint

1. Image Classification

We support the the following three datasets:

  • ImageNet-LT Dataset
  • iNaturalist-2018 Dataset
  • Place-LT Dataset

We refer the user to CLS_README for more details.

2. Object Detection/Instance Segmentation

We support the two versions of the LVIS dataset:

  • LVIS v0.5
  • LVIS v1.0

Highlight

  1. To speedup the evaluation on LVIS dataset, we provide the C++ optimized evaluation api by modifying the coco_eval(C++) in cvpods.
  • The C++ version lvis_eval API will save ~30% time when calculating the mAP.
  1. We provide support for the metric of AP_fixed and AP_pool proposed in large-vocab-devil
  2. We will support more recent works on long-tail detection in this project(e.g. EQLv2, CenterNet2, etc.) in the future.

We refer the user to DET_README for more details.

3. Semantic Segmentation

We adopt the mmsegmentation as the codebase for runing all experiments of DisAlign. Currently, the user should use DisAlign_Seg for the semantic segmentation experiments. We will add the support for these experiments in cvpods in the future.

Acknowledgement

Thanks for the following projects:

Citing DisAlign

If you are using the DisAlign in your research or with to refer to the baseline results publised in this repo, please use the following BibTex entry.

@inproceedings{zhang2021disalign,
  title={Distribution Alignment: A Unified Framework for Long-tail Visual Recognition.},
  author={Zhang, Songyang and Li, Zeming and Yan, Shipeng and He, Xuming and Sun, Jian},
  booktitle={CVPR},
  year={2021}
}

License

This repo is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Comments
  • scale in cosine classifier

    scale in cosine classifier

    Hi, thanks for your great work! I notice you use the cosine classifier in many experiments and it can get a better baseline. The formula is as follows

    image

    I am wondering the value of s?

    opened by L1aoXingyu 5
  •  Is it correct to freeze the weight and bias of the DisAlign Linear Layer as well?

    Is it correct to freeze the weight and bias of the DisAlign Linear Layer as well?

    Hello. Thank you for your project! I'm testing your code on my custom dataset. My task is classification. I have a question about your code implementation.

    https://github.com/Megvii-BaseDetection/DisAlign/blob/a2fc3500a108cb83e3942293a5675c97ab3a2c6e/classification/imagenetlt/resnext50/resx50.scratch.imagenet_lt.224size.90e.disalign.10e/net.py#L56-L62

    From my understanding, in stage 2, remove the linear layer used in stage 1 and add DisAlign Linear Layer. And freeze all parts except for logit_scale, logit_bias, and confidence_layer. At this time, the weight and bias of DisAlignLinear are also frozen. (self.weight, self.bias) Is my understanding correct?

    If so, are the weight and bias of DisAlignLinearLayer fixed after the initialization? (The weight and bias of the linear layer in stage 1 are not copied either)

    If my understanding is correct, why is the weight of DisAlignLinear also frozen?

    I will wait for your reply. thanks!

    opened by jeongHwarr 4
  • Where is the DisAlignLinear module?

    Where is the DisAlignLinear module?

    Hello. Thank you for your impressive project!

    I want to apply DisAlign to classification. However, an error occurs in the import part. https://github.com/Megvii-BaseDetection/DisAlign/blob/a2fc3500a108cb83e3942293a5675c97ab3a2c6e/classification/imagenetlt/resnext50/resx50.scratch.imagenet_lt.224size.90e.disalign.10e/net.py#L7 I coudn't find the DisAlignLinear in cvpods.layers. and there also isn't exist at https://github.com/Megvii-BaseDetection/cvpods/tree/master/cvpods/layers How can I solve this problem?

    Thank you!

    opened by jeongHwarr 4
  • Can someone kindly share their codes of Classification task on ImageNet_LT?

    Can someone kindly share their codes of Classification task on ImageNet_LT?

    I tried to train the proposed method on ImageNet_LT, but I can only get an average testing rate about 49%, which is far from the rate described in the paper (52.9). Some of the details regarding my implementations are given as follows: (1) The feature extractor is ResNexT-50 and the head classifier is a linear classifier. The testing accuracy in Stage-One is 43.9%, which is OK.

    (2) The testing accuracy of adopting cRT method in Stage-Two is 49.6%, which is identical to one reported in other papers. (3) When fine-tuning the model in Stage-2, both the feature-extractor and head classifier are frozen, and a DisAliLinear model (which is implemented in CVPODs) is retrained. The testing accuracy can only reach 48.8%, which is far away from the one reported in your paper.

    opened by smallcube 4
  • The code for semantic segmentation is missing

    The code for semantic segmentation is missing

    Hi, thank you for the nice work, but the code for semantic segmentation is missing and the URL for it in the README could not be opened. Could you please fix this issue?

    opened by curiosity654 3
  • About the reference Distribution p_r in Eq. (10)

    About the reference Distribution p_r in Eq. (10)

    Hi, Thank you for providing your code. Here I was wondering the Equation (10) in your paper (The definition of p_r), which seems not to be a distribution. Since every x_i can only have one label, the reference distribution p_r(y| x_i) will be the distribution like (0, 0, 0,...,w_c, 0, 0,...,0). And the sum of this distribution is w_c, but not 1.

    Could you help me understand this equation? Thanks in advance.

    opened by Kevinz-code 3
  • import error

    import error

    Hi, thanks for the great work. Maybe I missed it, but it seems that the code for this project has been incorporated into cvpods. I couldn't launch any experiments due to ImportErrors like: from cvpods.layers import DisAlignLinear ImportError: cannot import name 'DisAlignLinear' from 'cvpods.layers' Also, I didn't find the corresponding functions in cvpods.

    Any help will be appreciated. Thanks.

    opened by YUE-FAN 2
  • about the confidence score σ(x)

    about the confidence score σ(x)

    In the paper, the σ(x) is implemented as a linear layer followed by a non-linear activation function (e.g., sigmoid function) for all input x. How to understand the input x?the matrix of raw iamge, or the extracted features, even or cls_score? Thank you!

    opened by lzed2399 2
  • exp_reweight = exp_reweight / np.sum(exp_reweight) * num_foreground

    exp_reweight = exp_reweight / np.sum(exp_reweight) * num_foreground

    Dear author, I have some questions about the code and paper:

    1. exp_reweight = exp_reweight / np.sum(exp_reweight) * num_foreground Why "exp_reweight" is multiplied by the coefficient "num_foreground"? It is not mentioned in the paper.
    2. Is "K" in the empirical class frequencies r = [r1, · · · , rK] on the training set in the paper the same as the class number C of the training set?
    opened by Liu-wanbing 2
  • The DisAlign_Seg page can't open

    The DisAlign_Seg page can't open

    opened by Kittywyk 1
  • Do you use validation dataset?

    Do you use validation dataset?

    https://github.com/Megvii-BaseDetection/DisAlign/blob/main/classification/imagenetlt/resnext50/resx50.scratch.imagenet_lt.224size.90e.disalign.10e/config.py#L31

    It seems that you only use test dataset? What is the reason for that?

    opened by qianlanwyd 1
  • How can I test and augtest the trained semseg DisAlign model?

    How can I test and augtest the trained semseg DisAlign model?

    opened by jh151170 0
  • the code question in semantic_seg

    the code question in semantic_seg

    Hi, I have a questation about the logit_scale and logit_bias in semantic_seg. The shape of the above parameter is (1, num_classes, 1, 1), why not is (1, num_classes, 512, 512) which is matched the input image size for semantic segmenation.

    opened by Ianresearch 8
  • Value of the learned scale and bias vector?

    Value of the learned scale and bias vector?

    Hi, did you check the value change of the learned scale and bias vector throughout the training process? I find the value of them change in the first few iterations and remain stable in the rest time on my own classification dataset. I wonder how the learned vectors look like in your paper? Thanks!

    opened by Jacobew 1
Owner
BaseDetection Team of Megvii
CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum

CO-PILOT CO-PILOT: COllaborative Planning and reInforcement Learning On sub-Task curriculum, NeurIPS 2021, Shuang Ao, Tianyi Zhou, Guodong Long, Qingh

Shuang Ao 1 Feb 18, 2022
Locally cache assets that are normally streamed in POPULATION: ONE

Population One Localizer This is no longer needed as of the build shipped on 03/03/22, thank you bigbox :) Locally cache assets that are normally stre

Ahman Woods 2 Mar 04, 2022
A collection of educational notebooks on multi-view geometry and computer vision.

Multiview notebooks This is a collection of educational notebooks on multi-view geometry and computer vision. Subjects covered in these notebooks incl

Max 65 Dec 09, 2022
A hybrid SOTA solution of LiDAR panoptic segmentation with C++ implementations of point cloud clustering algorithms. ICCV21, Workshop on Traditional Computer Vision in the Age of Deep Learning

ICCVW21-TradiCV-Survey-of-LiDAR-Cluster Motivation In contrast to popular end-to-end deep learning LiDAR panoptic segmentation solutions, we propose a

YimingZhao 103 Nov 22, 2022
Basics of 2D and 3D Human Pose Estimation.

Human Pose Estimation 101 If you want a slightly more rigorous tutorial and understand the basics of Human Pose Estimation and how the field has evolv

Sudharshan Chandra Babu 293 Dec 14, 2022
public repo for ESTER dataset and modeling (EMNLP'21)

Project / Paper Introduction This is the project repo for our EMNLP'21 paper: https://arxiv.org/abs/2104.08350 Here, we provide brief descriptions of

PlusLab 19 Oct 27, 2022
Stochastic Extragradient: General Analysis and Improved Rates

Stochastic Extragradient: General Analysis and Improved Rates This repository is the official implementation of the paper "Stochastic Extragradient: G

Hugo Berard 4 Nov 11, 2022
Retinal vessel segmentation based on GT-UNet

Retinal vessel segmentation based on GT-UNet Introduction This project is a retinal blood vessel segmentation code based on UNet-like Group Transforme

Kent0n 27 Dec 18, 2022
Implementation of Google Brain's WaveGrad high-fidelity vocoder

WaveGrad Implementation (PyTorch) of Google Brain's high-fidelity WaveGrad vocoder (paper). First implementation on GitHub with high-quality generatio

Ivan Vovk 363 Dec 27, 2022
Python scripts form performing stereo depth estimation using the HITNET model in ONNX.

ONNX-HITNET-Stereo-Depth-estimation Python scripts form performing stereo depth estimation using the HITNET model in ONNX. Stereo depth estimation on

Ibai Gorordo 30 Nov 08, 2022
A Library for Modelling Probabilistic Hierarchical Graphical Models in PyTorch

A Library for Modelling Probabilistic Hierarchical Graphical Models in PyTorch

Korbinian Pöppel 47 Nov 28, 2022
The source codes for TME-BNA: Temporal Motif-Preserving Network Embedding with Bicomponent Neighbor Aggregation.

TME The source codes for TME-BNA: Temporal Motif-Preserving Network Embedding with Bicomponent Neighbor Aggregation. Our implementation is based on TG

2 Feb 10, 2022
Source code for Zalo AI 2021 submission

zalo_ltr_2021 Source code for Zalo AI 2021 submission Solution: Pipeline We use the pipepline in the picture below: Our pipeline is combination of BM2

128 Dec 27, 2022
PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

PyExplainer PyExplainer is a local rule-based model-agnostic technique for generating explanations (i.e., why a commit is predicted as defective) of J

AI Wizards for Software Management (AWSM) Research Group 14 Nov 13, 2022
List of content farm sites like g.penzai.com.

内容农场网站清单 Google 中文搜索结果包含了相当一部分的内容农场式条目,比如「小 X 知识网」「小 X 百科网」。此种链接常会 302 重定向其主站,页面内容为自动生成,大量堆叠关键字,揉杂一些爬取到的内容,完全不具可读性和参考价值。 尤为过分的是,该类网站可能有成千上万个分身域名被 Goog

WDMPA 541 Jan 03, 2023
Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021) Hang Zhou, Yasheng Sun, Wayne Wu, Chen Cha

Hang_Zhou 628 Dec 28, 2022
An Implementation of Fully Convolutional Networks in Tensorflow.

Update An example on how to integrate this code into your own semantic segmentation pipeline can be found in my KittiSeg project repository. tensorflo

Marvin Teichmann 1.1k Dec 12, 2022
Inference pipeline for our participation in the FeTA challenge 2021.

feta-inference Inference pipeline for our participation in the FeTA challenge 2021. Team name: TRABIT Installation Download the two folders in https:/

Lucas Fidon 2 Apr 13, 2022
GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks This repository implements a capsule model Inten

Joel Huang 15 Dec 24, 2022
git《Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser》(2021) GitHub: [fig5]

Pseudo-ISP: Learning Pseudo In-camera Signal Processing Pipeline from A Color Image Denoiser Abstract The success of deep denoisers on real-world colo

Yue Cao 51 Nov 22, 2022