TrTr: Visual Tracking with Transformer

Related tags

Deep LearningTrTr
Overview

TrTr: Visual Tracking with Transformer

We propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder architecture to gain global and rich contextual interdependencies. In this new architecture, features of the template image is processed by a self-attention module in the encoder part to learn strong context information, which is then sent to the decoder part to compute cross-attention with the search image features processed by another self-attention module. In addition, we design the classification and regression heads using the output of Transformer to localize target based on shape-agnostic anchor. We extensively evaluate our tracker TrTr, on several benchmarks and our method performs favorably against state-of-the-art algorithms.

Network architecture of TrTr for visual tracking

Installation

Install dependencies

$ ./install.sh ~/anaconda3 trtr 

note1: suppose you have the anaconda installation path under ~/anaconda3.

note2: please select a proper cuda-toolkit version to install Pytorch from conda, the default is 10.1. However, for RTX3090, please select 11.0. Then the above installation command would be $ ./install.sh ~/anaconda3 trtr 11.0.

Activate conda environment

$ conda activate trtr

Quick Start: Using TrTr

Webcam demo

Offline Model

$ python demo.py --tracker.checkpoint networks/trtr_resnet50.pth --use_baseline_tracker

Online Model

$ python demo.py --tracker.checkpoint networks/trtr_resnet50.pth

image sequences (png, jpeg)

add option --video_name ${video_dir}

video (mp4 or avi)

add option --video_name ${video_name}

Benchmarks

Download testing datasets

Please read this README.md to prepare the dataset.

Basic usage

Test tracker

$ cd benchmark
$ python test.py --cfg_file ../parameters/experiment/vot2018/offline.yaml
  • --cfg_file: the yaml file containing the hyper-parameter for each datasets. Please check ./benchmark/parameters/experiment for more yaml files
    • online model for VOT2018: python test.py --cfg_file ../parameters/experiment/vot2018/online.yaml
    • online model for OTB: python test.py --cfg_file ../parameters/experiment/otb/online.yaml
  • --result_path: optional parameter to specify a directory to store the tracking result. Default value is results, which generate ./benchmark/results/${dataset_name}
  • --model_name: optional parameter to specify the name of tracker name under the result path. Default value is trtr, which yield a tracker directory of ./benchmark/results/${dataset_name}/trtr
  • --vis: visualize tracking
  • --repetition: repeat number. For example, you should assign --repetition 15 for VOT benchmark following the official evaluation.

Eval tracker

$ cd benchmark
$ python eval.py
  • --dataset: parameter to specify the benchmark. Default value is VOT2018. Please assign other bench name, e.g., OTB, VOT2019, UAV, etc.
  • --tracker_path: parameter to specify the result directory. Default value is ./benchmark/results. This is a parameter related to --result_path parameter in python test.py.
  • --num: parameter to specify the thread number for evaluation multiple tracker results. Default is 1.

(Option) Hyper-parameter search

$ python hp_search.py --tracker.checkpoint ../networks/trtr_resnet50.pth --tracker.search_sizes 280 --separate --repetition 1  --use_baseline_tracker --tracker.model.transformer_mask True

Train

Download training datasets

Please read this README.md to prepare the training dataset.

Download VOT2018 dataset

  1. Please download VOT2018 dataset following [this REAMDE], which is necessary for testing the model during training.
  2. Or you skip this testing process by assigning several parameter, which are explained later.

Test with single GPU

$ python main.py  --cfg_file ./parameters/train/default.yaml --output_dir train

note1: please check ./parameters/train/default.yaml for the parameters for training note2: --output_dir to assign the path to store the training result. The above commmand genearte ./train note3: maybe you have to modify the file limit: ulimit -n 8192. Write in ~/.bashrc maybe better. note4: you can a larger value for --benchmark_start_epoch than for --epochs to skip benchmark test. e.g., --benchmark_start_epoch 21 and --epochs 20

debug mode for quick checking the training process:

$ python main.py  --cfg_file ./parameters/train/default.yaml  --batch_size 16 --dataset.paths ./datasets/yt_bb/dataset/Curation  ./datasets/vid/dataset/Curation/ --dataset.video_frame_ranges 3 100  --dataset.num_uses 100 100  --dataset.eval_num_uses 100 100  --resume networks/trtr_resnet50.pth --benchmark_start_epoch 0 --epochs 10

Multi GPUs

multi GPUs in single machine

$ python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --cfg_file ./parameters/train/default.yaml --output_dir train

--nproc_per_node: is the number of GPU to use. The above command means use two GPUs in a machine.

multi GPUs in multi machines

Master Machine

$ python -m torch.distributed.launch --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr="${MASTER_IP_ADDRESS}" --master_port=${port} --use_env main.py --cfg_file ./parameters/train/default.yaml --output_dir train  --benchmark_start_epoch 8
  • --nnodes: number of machine to use. The above command means two machines.
  • --node_rank: the id for each machine. Master should be 0.
  • master_addr: assign the IP address of master machine
  • master_port: open port (e.g., 8080)

Slave1 Machine

$ python -m torch.distributed.launch --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr="${MASTER_IP_ADDRESS}" --master_port=${port} --use_env main.py --cfg_file ./parameters/train/default.yaml
Owner
趙 漠居(Zhao, Moju)
Project Lecture in the Uiversity of Tokyo.
趙 漠居(Zhao, Moju)
Constructing interpretable quadratic accuracy predictors to serve as an objective function for an IQCQP problem that represents NAS under latency constraints and solve it with efficient algorithms.

IQNAS: Interpretable Integer Quadratic programming Neural Architecture Search Realistic use of neural networks often requires adhering to multiple con

0 Oct 24, 2021
Block Sparse movement pruning

Movement Pruning: Adaptive Sparsity by Fine-Tuning Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; ho

Hugging Face 54 Dec 20, 2022
Neural style transfer as a class in PyTorch

pt-styletransfer Neural style transfer as a class in PyTorch Based on: https://github.com/alexis-jacq/Pytorch-Tutorials Adds: StyleTransferNet as a cl

Tyler Kvochick 31 Jun 27, 2022
Spatial Transformer Nets in TensorFlow/ TensorLayer

MOVED TO HERE Spatial Transformer Networks Spatial Transformer Networks (STN) is a dynamic mechanism that produces transformations of input images (or

Hao 36 Nov 23, 2022
Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF shows significant improvements over baseline fine-tuning without data filtration.

Information Gain Filtration Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF sho

4 Jul 28, 2022
[AAAI 2022] Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

A paper Introduction This is an official release of the paper Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation wit

Jiacheng Wang 14 Dec 08, 2022
A NSFW content filter.

Project_Nfilter A NSFW content filter. With a motive of minimizing the spreads and leakage of NSFW contents on internet and access to others devices ,

1 Jan 20, 2022
[CVPR 2021] 'Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator'

[CVPR2021] Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator Overview This is the entire codebase for the paper

35 Dec 01, 2022
CM building dataset Timisoara

CM_building_dataset_Timisoara Date created: Febr-2020 The Timi\c{s}oara Building Dataset - TMBuD - is composed of 160 images with the resolution of 76

Orhei Ciprian 5 Sep 07, 2022
OpenL3: Open-source deep audio and image embeddings

OpenL3 OpenL3 is an open-source Python library for computing deep audio and image embeddings. Please refer to the documentation for detailed instructi

Music and Audio Research Laboratory - NYU 326 Jan 02, 2023
Efficient Training of Audio Transformers with Patchout

PaSST: Efficient Training of Audio Transformers with Patchout This is the implementation for Efficient Training of Audio Transformers with Patchout Pa

165 Dec 26, 2022
A Deep Reinforcement Learning Framework for Stock Market Trading

DQN-Trading This is a framework based on deep reinforcement learning for stock market trading. This project is the implementation code for the two pap

61 Jan 01, 2023
Keywords : Streamlit, BertTokenizer, BertForMaskedLM, Pytorch

Next Word Prediction Keywords : Streamlit, BertTokenizer, BertForMaskedLM, Pytorch 🎬 Project Demo ✔ Application is hosted on Streamlit. You can see t

Vivek7 3 Aug 26, 2022
Experimental solutions to selected exercises from the book [Advances in Financial Machine Learning by Marcos Lopez De Prado]

Advances in Financial Machine Learning Exercises Experimental solutions to selected exercises from the book Advances in Financial Machine Learning by

Brian 1.4k Jan 04, 2023
SurfEmb (CVPR 2022) - SurfEmb: Dense and Continuous Correspondence Distributions

SurfEmb SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation with Learnt Surface Embeddings Rasmus Laurvig Haugard, A

Rasmus Haugaard 56 Nov 19, 2022
StyleGAN2-ADA - Official PyTorch implementation

Need Help? If you’re new to StyleGAN2-ADA and looking to get started, please check out this video series from a course Lia Coleman and I taught in Oct

Derrick Schultz 217 Jan 04, 2023
Structured Edge Detection Toolbox

################################################################### # # # Structure

Piotr Dollar 779 Jan 02, 2023
ESL: Event-based Structured Light

ESL: Event-based Structured Light Video (click on the image) This is the code for the 2021 3DV paper ESL: Event-based Structured Light by Manasi Mugli

Robotics and Perception Group 29 Oct 24, 2022
This is the repository for The Machine Learning Workshops, published by AI DOJO

This is the repository for The Machine Learning Workshops, published by AI DOJO. It contains all the workshop's code with supporting project files necessary to work through the code.

AI Dojo 12 May 06, 2022
The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL), NeurIPS-2021

Directed Graph Contrastive Learning The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL). In this paper, we present the first con

Tong Zekun 28 Jan 08, 2023