Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.

Overview

TS-CAM: Token Semantic Coupled Attention Map for Weakly SupervisedObject Localization

This is the official implementaion of paper TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

This repository contains Pytorch training code, evaluation code, pretrained models and jupyter notebook for more visualization.

Illustration

Based on Deit, TS-CAM couples attention maps from visual image transformer with semantic-aware maps to obtain accurate localization maps (Token Semantic Coupled Attention Map, ts-cam).

ts-cam

Model Zoo

We provide pretrained TS-CAM models trained on CUB-200-2011 and ImageNet_ILSVRC2012 datasets.

Dataset [email protected] [email protected] Loc.Gt-Known [email protected] [email protected] Baidu Drive Google Drive
CUB-200-2011 71.3 83.8 87.7 80.3 94.8 model model
ILSVRC2012 53.4 64.3 67.6 74.3 92.1 model model

Note: the Extrate Code for Baidu Drive is as follows:

Usage

First clone the repository locally:

git clone https://github.com/vasgaowei/TS-CAM.git

Then install Pytorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:


conda create -n pytorch1.7 python=3.6
conda activate pytorc1.7
conda install anaconda
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=10.2 -c pytorch
pip install timm==0.3.2

Data preparation

CUB-200-2011 dataset

Please download and extrate CUB-200-2011 dataset.

The directory structure is the following:

TS-CAM/
  data/
    CUB-200-2011/
      attributes/
      images/
      parts/
      bounding_boxes.txt
      classes.txt
      image_class_labels.txt
      images.txt
      image_sizes.txt
      README
      train_test_split.txt

ImageNet1k

Download ILSVRC2012 dataset and extract train and val images.

The directory structure is organized as follows:

TS-CAM/
  data/
  ImageNet_ILSVRC2012/
    ILSVRC2012_list/
    train/
      n01440764/
        n01440764_18.JPEG
        ...
      n01514859/
        n01514859_1.JPEG
        ...
    val/
      n01440764/
        ILSVRC2012_val_00000293.JPEG
        ...
      n01531178/
        ILSVRC2012_val_00000570.JPEG
        ...
    ILSVRC2012_list/
      train.txt
      val_folder.txt
      val_folder_new.txt

And the training and validation data is expected to be in the train/ folder and val folder respectively:

For training:

On CUB-200-2011 dataset:

bash train_val_cub.sh {GPU_ID} ${NET}

On ImageNet1k dataset:

bash train_val_ilsvrc.sh {GPU_ID} ${NET}

Please note that pretrained model weights of Deit-tiny, Deit-small and Deit-base on ImageNet-1k model will be downloaded when you first train you model, so the Internet should be connected.

For evaluation:

On CUB-200-2011 dataset:

bash val_cub.sh {GPU_ID} ${NET} ${MODEL_PATH}

On ImageNet1k dataset:

bash val_ilsvrc.sh {GPU_ID} ${NET} ${MODEL_PATH}

GPU_ID should be specified and multiple GPUs can be used for accelerating training and evaluation.

NET shoule be chosen among tiny, small and base.

MODEL_PATH is the path of pretrained model.

Visualization

We provided jupyter notebook in tools_cam folder.

TS-CAM/
  tools-cam/
    visualization_attention_map_cub.ipynb
    visualization_attention_map_imaget.ipynb

Please download pretrained TS-CAM model weights and try more visualzation results((Attention maps using our method and Attention Rollout method)). You can try other interseting images you like to show the localization map(ts-cams).

Visualize localization results

We provide some visualization results as follows.

localization

Visualize attention maps

We can also visualize attention maps from different transformer layers.

attention maps_cub attention_map_ilsvrc

Contacts

If you have any question about our work or this repository, please don't hesitate to contact us by emails.

You can also open an issue under this project.

Citation

If you use this code for a paper please cite:

@article{Gao2021TSCAMTS,
  title={TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization},
  author={Wei Gao and Fang Wan and Xingjia Pan and Zhiliang Peng and Qi Tian and Zhenjun Han and Bolei Zhou and Qixiang Ye},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.14862}
}
Owner
vasgaowei
vasgaowei
This is the repository for paper NEEDLE: Towards Non-invertible Backdoor Attack to Deep Learning Models.

This is the repository for paper NEEDLE: Towards Non-invertible Backdoor Attack to Deep Learning Models.

1 Oct 25, 2021
A supplementary code for Editable Neural Networks, an ICLR 2020 submission.

Editable neural networks A supplementary code for Editable Neural Networks, an ICLR 2020 submission by Anton Sinitsin, Vsevolod Plokhotnyuk, Dmitry Py

Anton Sinitsin 32 Nov 29, 2022
A Python package to process & model ChEMBL data.

insilico: A Python package to process & model ChEMBL data. ChEMBL is a manually curated chemical database of bioactive molecules with drug-like proper

Steven Newton 0 Dec 09, 2021
Free Book about Deep-Learning approaches for Chess (like AlphaZero, Leela Chess Zero and Stockfish NNUE)

Free Book about Deep-Learning approaches for Chess (like AlphaZero, Leela Chess Zero and Stockfish NNUE)

Dominik Klein 189 Dec 21, 2022
Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021

Refer-it-in-RGBD This is the repository of our paper 'Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images' in CVPR 2021 Pape

Haolin Liu 34 Nov 07, 2022
RL-driven agent playing tic-tac-toe on starknet against challengers.

tictactoe-on-starknet RL-driven agent playing tic-tac-toe on starknet against challengers. GUI reference: https://pythonguides.com/create-a-game-using

21 Jul 30, 2022
PixelPyramids: Exact Inference Models from Lossless Image Pyramids (ICCV 2021)

PixelPyramids: Exact Inference Models from Lossless Image Pyramids This repository contains the PyTorch implementation of the paper PixelPyramids: Exa

Visual Inference Lab @TU Darmstadt 8 Dec 11, 2022
This is the code of "Multi-view Contrastive Graph Clustering" in NeurlPS 2021.

MCGC Description This is the code of "Multi-view Contrastive Graph Clustering" in NeurlPS 2021. Datasets Results ACM DBLP IMDB Amazon photos Amazon co

31 Nov 14, 2022
[NeurIPS 2021] "Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems"

Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems Introduction Multi-agent control i

VITA 6 May 05, 2022
Multivariate Boosted TRee

Multivariate Boosted TRee What is MBTR MBTR is a python package for multivariate boosted tree regressors trained in parameter space. The package can h

SUPSI-DACD-ISAAC 61 Dec 19, 2022
House_prices_kaggle - Predict sales prices and practice feature engineering, RFs, and gradient boosting

House Prices - Advanced Regression Techniques Predicting House Prices with Machine Learning This project is build to enhance my knowledge about machin

Gurpreet Singh 1 Jan 01, 2022
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

Computational Linguistics Research Group 8.4k Jan 03, 2023
A collection of resources on GAN Inversion.

This repo is a collection of resources on GAN inversion, as a supplement for our survey

On Nonlinear Latent Transformations for GAN-based Image Editing - PyTorch implementation

On Nonlinear Latent Transformations for GAN-based Image Editing - PyTorch implementation On Nonlinear Latent Transformations for GAN-based Image Editi

Valentin Khrulkov 22 Oct 24, 2022
Winners of the Facebook Image Similarity Challenge

Winners of the Facebook Image Similarity Challenge

DrivenData 111 Jan 05, 2023
Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning

T2I_CL This is the official Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning Requirements Linux Python

42 Dec 31, 2022
Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data

FTLNet_Pytorch Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data 1. Introduction This repo is an unofficial

1 Nov 04, 2020
This package contains a PyTorch Implementation of IB-GAN of the submitted paper in AAAI 2021

The PyTorch implementation of IB-GAN model of AAAI 2021 This package contains a PyTorch implementation of IB-GAN presented in the submitted paper (IB-

Insu Jeon 9 Mar 30, 2022
DrNAS: Dirichlet Neural Architecture Search

This paper proposes a novel differentiable architecture search method by formulating it into a distribution learning problem. We treat the continuously relaxed architecture mixing weight as random va

Xiangning Chen 37 Jan 03, 2023
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Katsuya Hyodo 24 Mar 02, 2022