[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

Overview

MixFormer

The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention

PWC

PWC

[Models and Raw results] (Google Driver) [Models and Raw results] (Baidu Driver: hmuv)

MixFormer_Framework

News

[Mar 21, 2022]

  • MixFormer is accepted to CVPR2022.
  • We release Code, models and raw results.

[Mar 29, 2022]

  • Our paper is selected for an oral presentation.

Highlights

New transformer tracking framework

MixFormer is composed of a target-search mixed attention (MAM) based backbone and a simple corner head, yielding a compact tracking pipeline without an explicit integration module.

End-to-end, Positional-embedding-free, multi-feature-aggregation-free

Mixformer is an end-to-end tracking framework without post-processing. Compared with other transformer trackers, MixFormer doesn's use positional embedding, attentional mask and multi-layer feature aggregation strategy.

Strong performance

Tracker VOT2020 (EAO) LaSOT (NP) GOT-10K (AO) TrackingNet (NP)
MixFormer 0.555 79.9 70.7 88.9
ToMP101* (CVPR2022) - 79.2 - 86.4
SBT-large* (CVPR2022) 0.529 - 70.4 -
SwinTrack* (Arxiv2021) - 78.6 69.4 88.2
Sim-L/14* (Arxiv2022) - 79.7 69.8 87.4
STARK (ICCV2021) 0.505 77.0 68.8 86.9
KeepTrack (ICCV2021) - 77.2 - -
TransT (CVPR2021) 0.495 73.8 67.1 86.7
TrDiMP (CVPR2021) - - 67.1 83.3
Siam R-CNN (CVPR2020) - 72.2 64.9 85.4
TREG (Arxiv2021) - 74.1 66.8 83.8

Install the environment

Use the Anaconda

conda create -n mixformer python=3.6
conda activate mixformer
bash install_pytorch17.sh

Data Preparation

Put the tracking datasets in ./data. It should look like:

${MixFormer_ROOT}
 -- data
     -- lasot
         |-- airplane
         |-- basketball
         |-- bear
         ...
     -- got10k
         |-- test
         |-- train
         |-- val
     -- coco
         |-- annotations
         |-- train2017
     -- trackingnet
         |-- TRAIN_0
         |-- TRAIN_1
         ...
         |-- TRAIN_11
         |-- TEST

Set project paths

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir .

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Train MixFormer

Training with multiple GPUs using DDP. More details of other training settings can be found at tracking/train_mixformer.sh

# MixFormer
bash tracking/train_mixformer.sh

Test and evaluate MixFormer on benchmarks

  • LaSOT/GOT10k-test/TrackingNet/OTB100/UAV123. More details of test settings can be found at tracking/test_mixformer.sh
bash tracking/test_mixformer.sh
  • VOT2020
    Before evaluating "MixFormer+AR" on VOT2020, please install some extra packages following external/AR/README.md. Also, the VOT toolkit is required to evaluate our tracker. To download and instal VOT toolkit, you can follow this tutorial. For convenience, you can use our example workspaces of VOT toolkit under external/vot20/ by setting trackers.ini.
cd external/vot20/<workspace_dir>
vot evaluate --workspace . MixFormerPython
# generating analysis results
vot analysis --workspace . --nocache

Run MixFormer on your own video

bash tracking/run_video_demo.sh

Compute FLOPs/Params and test speed

bash tracking/profile_mixformer.sh

Visualize attention maps

bash tracking/vis_mixformer_attn.sh

vis_attn

Model Zoo and raw results

The trained models and the raw tracking results are provided in the [Models and Raw results] (Google Driver) or [Models and Raw results] (Baidu Driver: hmuv).

Contact

Yutao Cui: [email protected]

Cheng Jiang: [email protected]

Acknowledgments

  • Thanks for PyTracking Library and STARK Library, which helps us to quickly implement our ideas.
  • We use the implementation of the CvT from the official repo CvT.
Owner
Multimedia Computing Group, Nanjing University
Multimedia Computing Group, Nanjing University
TensorFlow CNN for fast style transfer

Fast Style Transfer in TensorFlow Add styles from famous paintings to any photo in a fraction of a second! It takes 100ms on a 2015 Titan X to style t

1 Dec 14, 2021
GeoTransformer - Geometric Transformer for Fast and Robust Point Cloud Registration

Geometric Transformer for Fast and Robust Point Cloud Registration PyTorch imple

Zheng Qin 220 Jan 05, 2023
CVPR2021: Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network - Pytorch This repo holds the pytorch-version codes of paper: "Temporal Context Aggregation Network for Temporal

Zhiwu Qing 63 Sep 27, 2022
Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time".

FastBERT Source code for "FastBERT: a Self-distilling BERT with Adaptive Inference Time". Good News 2021/10/29 - Code: Code of FastPLM is released on

Weijie Liu 584 Jan 02, 2023
Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Codes-for-Algorithms Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Tracy (Shengmin) Tao 1 Apr 12, 2022
Pytorch implementation of Implicit Behavior Cloning.

Implicit Behavior Cloning - PyTorch (wip) Pytorch implementation of Implicit Behavior Cloning. Install conda create -n ibc python=3.8 pip install -r r

Kevin Zakka 49 Dec 25, 2022
Py-faster-rcnn - Faster R-CNN (Python implementation)

py-faster-rcnn has been deprecated. Please see Detectron, which includes an implementation of Mask R-CNN. Disclaimer The official Faster R-CNN code (w

Ross Girshick 7.8k Jan 03, 2023
A general-purpose encoder-decoder framework for Tensorflow

READ THE DOCUMENTATION CONTRIBUTING A general-purpose encoder-decoder framework for Tensorflow that can be used for Machine Translation, Text Summariz

Google 5.5k Jan 07, 2023
Deep learning algorithms for muon momentum estimation in the CMS Trigger System

Deep learning algorithms for muon momentum estimation in the CMS Trigger System The Compact Muon Solenoid (CMS) is a general-purpose detector at the L

anuragB 2 Oct 06, 2021
Mmrotate - OpenMMLab Rotated Object Detection Benchmark

OpenMMLab website HOT OpenMMLab platform TRY IT OUT 📘 Documentation | 🛠️ Insta

OpenMMLab 1.2k Jan 04, 2023
Husein pet projects in here!

project-suka-suka Husein pet projects in here! List of projects mysejahtera-density. Generate resolution points using meshgrid and request each points

HUSEIN ZOLKEPLI 47 Dec 09, 2022
RL and distillation in CARLA using a factorized world model

World on Rails Learning to drive from a world on rails Dian Chen, Vladlen Koltun, Philipp Krähenbühl, arXiv techical report (arXiv 2105.00636) This re

Dian Chen 131 Dec 16, 2022
TensorFlow Implementation of Unsupervised Cross-Domain Image Generation

Domain Transfer Network (DTN) TensorFlow implementation of Unsupervised Cross-Domain Image Generation. Requirements Python 2.7 TensorFlow 0.12 Pickle

Yunjey Choi 865 Nov 17, 2022
Self Governing Neural Networks (SGNN): the Projection Layer

Self Governing Neural Networks (SGNN): the Projection Layer A SGNN's word projections preprocessing pipeline in scikit-learn In this notebook, we'll u

Guillaume Chevalier 22 Nov 06, 2022
pytorch implementation of openpose including Hand and Body Pose Estimation.

pytorch-openpose pytorch implementation of openpose including Body and Hand Pose Estimation, and the pytorch model is directly converted from openpose

Hzzone 1.4k Jan 07, 2023
[ICCV 2021] Target Adaptive Context Aggregation for Video Scene Graph Generation

Target Adaptive Context Aggregation for Video Scene Graph Generation This is a PyTorch implementation for Target Adaptive Context Aggregation for Vide

Multimedia Computing Group, Nanjing University 44 Dec 14, 2022
Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

The Hypersim Dataset For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real i

Apple 1.3k Jan 04, 2023
A torch.Tensor-like DataFrame library supporting multiple execution runtimes and Arrow as a common memory format

TorchArrow (Warning: Unstable Prototype) This is a prototype library currently under heavy development. It does not currently have stable releases, an

Facebook Research 536 Jan 06, 2023
Flexible Option Learning - NeurIPS 2021

Flexible Option Learning This repository contains code for the paper Flexible Option Learning presented as a Spotlight at NeurIPS 2021. The implementa

Martin Klissarov 7 Nov 09, 2022