[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

Last update: Jan 03, 2023

Related tags

Overview

MixFormer

The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention

[Models and Raw results] (Google Driver) [Models and Raw results] (Baidu Driver: hmuv)

News

[Mar 21, 2022]

MixFormer is accepted to CVPR2022.
We release Code, models and raw results.

[Mar 29, 2022]

Our paper is selected for an oral presentation.

Highlights

✨ New transformer tracking framework

MixFormer is composed of a target-search mixed attention (MAM) based backbone and a simple corner head, yielding a compact tracking pipeline without an explicit integration module.

✨ End-to-end, Positional-embedding-free, multi-feature-aggregation-free

Mixformer is an end-to-end tracking framework without post-processing. Compared with other transformer trackers, MixFormer doesn's use positional embedding, attentional mask and multi-layer feature aggregation strategy.

✨ Strong performance

Tracker	VOT2020 (EAO)	LaSOT (NP)	GOT-10K (AO)	TrackingNet (NP)
MixFormer	0.555	79.9	70.7	88.9
ToMP101* (CVPR2022)	-	79.2	-	86.4
SBT-large* (CVPR2022)	0.529	-	70.4	-
SwinTrack* (Arxiv2021)	-	78.6	69.4	88.2
Sim-L/14* (Arxiv2022)	-	79.7	69.8	87.4
STARK (ICCV2021)	0.505	77.0	68.8	86.9
KeepTrack (ICCV2021)	-	77.2	-	-
TransT (CVPR2021)	0.495	73.8	67.1	86.7
TrDiMP (CVPR2021)	-	-	67.1	83.3
Siam R-CNN (CVPR2020)	-	72.2	64.9	85.4
TREG (Arxiv2021)	-	74.1	66.8	83.8

Install the environment

Use the Anaconda

conda create -n mixformer python=3.6
conda activate mixformer
bash install_pytorch17.sh

Data Preparation

Put the tracking datasets in ./data. It should look like:

${MixFormer_ROOT}
 -- data
     -- lasot
         |-- airplane
         |-- basketball
         |-- bear
         ...
     -- got10k
         |-- test
         |-- train
         |-- val
     -- coco
         |-- annotations
         |-- train2017
     -- trackingnet
         |-- TRAIN_0
         |-- TRAIN_1
         ...
         |-- TRAIN_11
         |-- TEST

Set project paths

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir .

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Train MixFormer

Training with multiple GPUs using DDP. More details of other training settings can be found at tracking/train_mixformer.sh

# MixFormer
bash tracking/train_mixformer.sh

Test and evaluate MixFormer on benchmarks

LaSOT/GOT10k-test/TrackingNet/OTB100/UAV123. More details of test settings can be found at tracking/test_mixformer.sh

bash tracking/test_mixformer.sh

VOT2020
Before evaluating "MixFormer+AR" on VOT2020, please install some extra packages following external/AR/README.md. Also, the VOT toolkit is required to evaluate our tracker. To download and instal VOT toolkit, you can follow this tutorial. For convenience, you can use our example workspaces of VOT toolkit under external/vot20/ by setting trackers.ini.

cd external/vot20/<workspace_dir>
vot evaluate --workspace . MixFormerPython
# generating analysis results
vot analysis --workspace . --nocache

Run MixFormer on your own video

bash tracking/run_video_demo.sh

Compute FLOPs/Params and test speed

bash tracking/profile_mixformer.sh

Visualize attention maps

bash tracking/vis_mixformer_attn.sh

Model Zoo and raw results

The trained models and the raw tracking results are provided in the [Models and Raw results] (Google Driver) or [Models and Raw results] (Baidu Driver: hmuv).

Contact

Yutao Cui: [email protected]

Cheng Jiang: [email protected]

Acknowledgments

Thanks for PyTracking Library and STARK Library, which helps us to quickly implement our ideas.
We use the implementation of the CvT from the official repo CvT.

[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

Related tags

Overview

MixFormer

News

Highlights

✨ New transformer tracking framework

✨ End-to-end, Positional-embedding-free, multi-feature-aggregation-free

✨ Strong performance

Install the environment

Data Preparation

Set project paths

Train MixFormer

Test and evaluate MixFormer on benchmarks

Run MixFormer on your own video

Compute FLOPs/Params and test speed

Visualize attention maps

Model Zoo and raw results

Contact

Acknowledgments

Owner

Multimedia Computing Group, Nanjing University

Python lib to talk to pylontech lithium batteries (US2000, US3000, ...) using RS485

A PyTorch implementation of EfficientDet.

Code for reproducing our paper: LMSOC: An Approach for Socially Sensitive Pretraining

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

[ICLR'21] FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection, built on SECOND.

An AI made using artificial intelligence (AI) and machine learning algorithms (ML) .

Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences", CVPR 2021.

Jiminy Cricket Environment (NeurIPS 2021)

ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

Framework that uses artificial intelligence applied to mathematical models to make predictions

This repository focus on Image Captioning & Video Captioning & Seq-to-Seq Learning & NLP

Import Python modules from dicts and JSON formatted documents.

Camview - A CLI-tool used to stream CCTV online footage based on URL params

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

[NeurIPS 2021 Spotlight] Code for Learning to Compose Visual Relations

[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Improved Fitness Optimization Landscapes for Sequence Design

An excellent hash algorithm combining classical sponge structure and RNN.

Official implementation of the paper "Steganographer Detection via a Similarity Accumulation Graph Convolutional Network"