This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Last update: Jan 03, 2023

Overview

TransFG: A Transformer Architecture for Fine-grained Recognition

Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-grained Recognition

Implementation based on DeiT pretrained on ImageNet-1K with distillation fine-tuning will be released soon.

Framework

Dependencies:

Python 3.7.3
PyTorch 1.5.1
torchvision 0.6.1
ml_collections

Usage

1. Download Google pre-trained ViT models

Get models in this link: ViT-B_16, ViT-B_32...

wget https://storage.googleapis.com/vit_models/imagenet21k/{MODEL_NAME}.npz

2. Prepare data

In the paper, we use data from 5 publicly available datasets:

Please download them from the official websites and put them in the corresponding folders.

3. Install required packages

Install dependencies with the following command:

pip3 install -r requirements.txt

4. Train

To train TransFG on CUB-200-2011 dataset with 4 gpus in FP-16 mode for 10000 steps run:

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m torch.distributed.launch --nproc_per_node=4 train.py --dataset CUB_200_2011 --split overlap --num_steps 10000 --fp16 --name sample_run

Citation

If you find our work helpful in your research, please cite it as:

@article{he2021transfg,
  title={TransFG: A Transformer Architecture for Fine-grained Recognition},
  author={He, Ju and Chen, Jieneng and Liu, Shuai and Kortylewski, Adam and Yang, Cheng and Bai, Yutong and Wang, Changhu and Yuille, Alan},
  journal={arXiv preprint arXiv:2103.07976},
  year={2021}
}

Acknowledgement

Many thanks to ViT-pytorch for the PyTorch reimplementation of An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

Related tags

Overview

TransFG: A Transformer Architecture for Fine-grained Recognition

Framework

Dependencies:

Usage

1. Download Google pre-trained ViT models

2. Prepare data

3. Install required packages

4. Train

Citation

Acknowledgement

Owner

Ju He

Hiiii this is the Spanish for Linux and win 10 and in the near future the english version of PortScan my new tool on which you can see what ports are Open only with the IP adress.

Regions sanitàries (RS), Sectors Sanitàris (SS) i Àrees Bàsiques de Salut (ABS) de Catalunya

Fast style transfer

📷 This repository is focused on having various feature implementation of OpenCV in Python.

Official PyTorch implementation for "Mixed supervision for surface-defect detection: from weakly to fully supervised learning"

RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection

CellProfiler is a open-source application for biological image analysis

Generate a list of papers with publicly available source code in the daily arxiv

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

基于Paddle框架的PSENet复现

CUTIE (TensorFlow implementation of Convolutional Universal Text Information Extractor)

An Implementation of the FOTS: Fast Oriented Text Spotting with a Unified Network

Automatically remove the mosaics in images and videos, or add mosaics to them.

CRAFT-Pyotorch：Character Region Awareness for Text Detection Reimplementation for Pytorch

🖺 OCR using tensorflow with attention

Satoshi is a discord bot template in python using discord.py that allow you to track some live crypto prices with your own discord bot.

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Optical character recognition for Japanese text, with the main focus being Japanese manga

Driver Drowsiness Detection with OpenCV & Dlib