"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

Last update: Dec 06, 2022

Overview

FGVC8

Exploring Vision Transformers for Fine-grained Classification paper presented at the CVPR 2021, The Eight Workshop on Fine-Grained Visual Categorization on June 25th.

Abstract

Existing computer vision research in categorization struggles with fine-grained attributes recognition due to the inherently high intra-class variances and low inter-class variances. SOTA methods tackle this challenge by locating the most informative image regions and rely on them to classify the complete image. The most recent work, Vision Transformer (ViT), shows its strong performance in both traditional and fine-grained classification tasks.

In this work, we propose a multi-stage ViT framework for fine-grained image classification tasks, which localizes the informative image regions without requiring architectural changes using the inherent multi-head self-attention mechanism. We also introduce attention-guided augmentations for improving the model's capabilities.

We demonstrate the value of our approach by experimenting with four popular fine-grained benchmarks: CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC7 Plant Pathology. We also prove our model's interpretability via qualitative results.

Instructions

Upcoming

Citation

If you find interesting our results, or you use or code/ideas please consider to cite our work:

@misc{conde2021exploring,
      title={Exploring Vision Transformers for Fine-grained Classification}, 
      author={Marcos V. Conde and Kerem Turgutlu},
      year={2021},
      eprint={2106.10587},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

Related tags

Overview

FGVC8

Abstract

Instructions

Citation

References

Owner

Marcos V. Conde

The code of "Dependency Learning for Legal Judgment Prediction with a Unified Text-to-Text Transformer".

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

Awesome Graph Classification - A collection of important graph embedding, classification and representation learning papers with implementations.

CVNets: A library for training computer vision networks

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network.

这是一个yolo3-tf2的源码，可以用于训练自己的模型。

Gated-Shape CNN for Semantic Segmentation (ICCV 2019)

MAME is a multi-purpose emulation framework.

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

Cards Against Humanity AI

An Artificial Intelligence trying to drive a car by itself on a user created map

Code for our paper Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation

TrackFormer: Multi-Object Tracking with Transformers

Self-Supervised depth kalilia

This project uses Template Matching technique for object detecting by detection of template image over base image.

Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling

The goal of the exercises below is to evaluate the candidate knowledge and problem solving expertise regarding the main development focuses for the iFood ML Platform team: MLOps and Feature Store development.