An Implementation of SiameseRPN with Feature Pyramid Networks

Overview

SiameseRPN with FPN

This project is mainly based on HelloRicky123/Siamese-RPN. What I've done is just add a Feature Pyramid Network method to the original AlexNet structures.

For more details about siameseRPN please refer to the paper : High Performance Visual Tracking with Siamese Region Proposal Network by Bo Li, Junjie Yan,Wei Wu, Zheng Zhu, Xiaolin Hu.

For more details about Feature Pyramid Network please refer to the paper: Feature Pyramid Network for Object Detection by Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie.

Networks

  • Siamese Region Proposal Networks

    image-20210909160951628

  • Feature Pyramid Networks

    image-20210909161336484

  • SimaeseRPN+FPN

    • Template Branch

      0001

    • Detection Branch

      0001

Results

This project can get 0.618 AUC on OTB100, which also achieves overall 1.3% progress than the performance of baseline Siamese-RPN. Additionally, based on the ablation study results, it also shows that it can achieve robust performance different operating systems and GPUs.

Data preparation

I only use pre-trained models to finish my experiments,so here I would post the testing dataset OTB100 I get from http://cvlab.hanyang.ac.kr/tracker_benchmark/

If you don't want to download through the website above, you can just download: https://pan.baidu.com/s/1vWIn8ovCGKmlgIdHdt_MkA key: p8u4

For more details about OTB100 please refer to the paper: Object Tracking Benchmark by Yi Wu, Jongwoo Lim, Ming-Hsuan Yang.

Train phase

I didn't do any training but I still keep the baseline training method in my project. So if you have VID dataset or youtube-bb dataset, I would just post the steps of training here

Create dataset:

python bin/create_dataset_ytbid.py --vid-dir /PATH/TO/ILSVRC2015 --ytb-dir /PATH/TO/YT-BB --output-dir /PATH/TO/SAVE_DATA --num_threads 6

Create lmdb:

python bin/create_lmdb.py --data-dir /PATH/TO/SAVE_DATA --output-dir /PATH/TO/RESULT.lmdb --num_threads 12

Train:

python bin/train_siamrpn.py --data_dir /PATH/TO/SAVE_DATA

Test phase

If want to test the tracker, please first change the project path:

sys.path.append('[your_project_path]')

And then choose the combinations of different layers I putted in the net/network.py

then input your model path and dataset path to run:

python bin/test_OTB.py -ms [your_model_path] -v tb100 -d [your_dataset_path]

Environment

I've exported my anaconda and pip environment into /env/conda_env.yaml and /env/pip_requirements.txt

if you want to use it, just run the command below accordingly

for anaconda:

conda create -n [your_env_name] -f conda_env.yaml

for pip:

pip install -r requirements.txt

Model Download

Model which the baseline uses: https://pan.baidu.com/s/1vSvTqxaFwgmZdS00U3YIzQ keyword: v91k

Model after training 50 epoch: https://pan.baidu.com/s/1m9ISra0B04jcmjW1n73fxg keyword: 0s03

Experimental Environment

(1)

DELL-Precision-7530

OS: Ubuntu 18.04 LTS CPU: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz

Memory: 2*8G DDR4 2666MHZ

GPU: Nvidia Quadro P1000

(2)

HP OMEN

OS: Windows 10 Home Edition

CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.6GHz

Memory: 2*8G DDR4 2666MHZ

GPU: Nvidia Geforce RTX2060

Optimization

On Ubuntu and Quadro P1000

  • AUCs with model siamrpn_38.pth
Layers Results(AUC)
baseline 0.610
2+5 0.618
2+3+5 0.607
2+3+4+5 0.611
  • AUCs with model siamrpn_50.pth
Layers Results(AUC)
baseline 0.600
2+5 0.605
2+3+5 0.594
2+3+4+5 0.605

On Windows 10 and Nvidia Geforce RTX2060

  • AUCs with model siamrpn_38.pth
layers Results(AUC)
baseline 0.610
2+5 0.617
2+3+5 0.607
2+3+4+5 0.612
  • AUCs with model siamrpn_50.pth
Layers Results(AUC)
baseline 0.597
2+5 0.606
2+3+5 0.597
2+3+4+5 0.605

Reference

[1] B. Li, J. Yan, W. Wu, Z. Zhu, X. Hu, High Performance Visual Tracking with Siamese Region Proposal Network, inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pages 8971-8980.

[2] T. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, S. Belongie, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pages 2117-2125.

[3] Y. Wu, J. Lim, M. Yang, "Object Tracking Benchmark", in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, pages 1834-1848.

Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks

Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks - Official Project Page This repository contains the code develope

Amirsina Torfi 1.7k Dec 18, 2022
Educational API for 3D Vision using pose to control carton.

Educational API for 3D Vision using pose to control carton.

41 Jul 10, 2022
Camview - A CLI-tool used to stream CCTV online footage based on URL params

CamView A CLI-tool used to stream CCTV online footage based on URL params Get St

Finn Lancaster 54 Dec 09, 2022
Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"

Introduction This repository contains research code for the ACL 2021 paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual

AdapterHub 20 Aug 04, 2022
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pytorch Lightning 1.4k Jan 01, 2023
Official git repo for the CHIRP project

CHIRP Project This is the official git repository for the CHIRP project. Pull requests are accepted here, but for the moment, the main repository is s

Dan Smith 77 Jan 08, 2023
GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

22 Dec 12, 2022
The official project of SimSwap (ACM MM 2020)

SimSwap: An Efficient Framework For High Fidelity Face Swapping Proceedings of the 28th ACM International Conference on Multimedia The official reposi

Six_God 2.6k Jan 08, 2023
Cross-Modal Contrastive Learning for Text-to-Image Generation

Cross-Modal Contrastive Learning for Text-to-Image Generation This repository hosts the open source JAX implementation of XMC-GAN. Setup instructions

Google Research 94 Nov 12, 2022
3D HourGlass Networks for Human Pose Estimation Through Videos

3D-HourGlass-Network 3D CNN Based Hourglass Network for Human Pose Estimation (3D Human Pose) from videos. This was my summer'18 research project. Dis

Naman Jain 51 Jan 02, 2023
CUda Matrix Multiply library.

cumm CUda Matrix Multiply library. cumm is developed during learning of CUTLASS, which use too much c++ template and make code unmaintainable. So I de

49 Dec 27, 2022
Bu repo SAHI uygulamasını mantığını öğreniyoruz.

SAHI-Learn: SAHI'den Beraber Kodlamak İster Misiniz Herkese merhabalar ben Kadir Nar. SAHI kütüphanesine gönüllü geliştiriciyim. Bu repo SAHI kütüphan

Kadir Nar 11 Aug 22, 2022
From this paper "SESNet: A Semantically Enhanced Siamese Network for Remote Sensing Change Detection"

SESNet for remote sensing image change detection It is the implementation of the paper: "SESNet: A Semantically Enhanced Siamese Network for Remote Se

1 May 24, 2022
PixelPyramids: Exact Inference Models from Lossless Image Pyramids (ICCV 2021)

PixelPyramids: Exact Inference Models from Lossless Image Pyramids This repository contains the PyTorch implementation of the paper PixelPyramids: Exa

Visual Inference Lab @TU Darmstadt 8 Dec 11, 2022
NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework

NLP From Scratch Without Large-Scale Pretraining This repository contains the code, pre-trained model checkpoints and curated datasets for our paper:

Xingcheng Yao 224 Dec 08, 2022
maximal update parametrization (µP)

Maximal Update Parametrization (μP) and Hyperparameter Transfer (μTransfer) Paper link | Blog link In Tensor Programs V: Tuning Large Neural Networks

Microsoft 694 Jan 03, 2023
Image inpainting using Gaussian Mixture Models

dmfa_inpainting Source code for: MisConv: Convolutional Neural Networks for Missing Data (to be published at WACV 2022) Estimating conditional density

Marcin Przewięźlikowski 8 Oct 09, 2022
BlockUnexpectedPackets - Preventing BungeeCord CPU overload due to Layer 7 DDoS attacks by scanning BungeeCord's logs

BlockUnexpectedPackets This script automatically blocks DDoS attacks that are sp

SparklyPower 3 Mar 31, 2022
Program your own vulkan.gpuinfo.org query in Python. Used to determine baseline hardware for WebGPU.

query-gpuinfo-data License This software is not presently released under a license. The data in data/ is obtained under CC BY 4.0 as specified there.

Kai Ninomiya 5 Jul 18, 2022
Image Classification - A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

A research on image classification and auto insurance claim prediction, a systematic experiments on modeling techniques and approaches

0 Jan 23, 2022