DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Overview

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik, Anna Rohrbach, Trevor Darrell, Amir Globerson

DETReg

This repository is the implementation of DETReg, see Project Page.

Release

  • COCO training code and eval - DONE
  • Pretrained models - DONE
  • Pascal VOC training code and eval- TODO

Introduction

DETReg is an unsupervised pretraining approach for object DEtection with TRansformers using Region priors. Motivated by the two tasks underlying object detection: localization and categorization, we combine two complementary signals for self-supervision. For an object localization signal, we use pseudo ground truth object bounding boxes from an off-the-shelf unsupervised region proposal method, Selective Search, which does not require training data and can detect objects at a high recall rate and very low precision. The categorization signal comes from an object embedding loss that encourages invariant object representations, from which the object category can be inferred. We show how to combine these two signals to train the Deformable DETR detection architecture from large amounts of unlabeled data. DETReg improves the performance over competitive baselines and previous self-supervised methods on standard benchmarks like MS COCO and PASCAL VOC. DETReg also outperforms previous supervised and unsupervised baseline approaches on low-data regime when trained with only 1%, 2%, 5%, and 10% of the labeled data on MS COCO.

Installation

Requirements

  • Linux, CUDA>=9.2, GCC>=5.4

  • Python>=3.7

    We recommend you to use Anaconda to create a conda environment:

    conda create -n detreg python=3.7 pip

    Then, activate the environment:

    conda activate detreg

    Installation: (change cudatoolkit to your cuda version. For detailed pytorch installation instructions click here)

    conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
  • Other requirements

    pip install -r requirements.txt

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)
python test.py

Usage

Dataset preparation

Please download COCO 2017 dataset and ImageNet and organize them as following:

code_root/
└── data/
    ├── ilsvrc/
          ├── train/
          └── val/
    └── MSCoco/
        ├── train2017/
        ├── val2017/
        └── annotations/
        	├── instances_train2017.json
        	└── instances_val2017.json

Note that in this work we used the ImageNet100 dataset, which is x10 smaller than ImageNet. To create ImageNet100 run the following command:

mkdir -p data/ilsvrc100/train
mkdir -p data/ilsvrc100/val
while read line; do ln -s <code_root>/data/ilsvrc/train/$line <code_root>/data/ilsvrc100/train/$line; done < <code_root>/datasets/category.txt
while read line; do ln -s <code_root>/data/ilsvrc/val/$line <code_root>/data/ilsvrc100/val/$line; done < <code_root>/datasets/category.txt

This should results with the following structure:

code_root/
└── data/
    ├── ilsvrc/
          ├── train/
          └── val/
    ├── ilsvrc100/
          ├── train/
          └── val/
    └── MSCoco/
        ├── train2017/
        ├── val2017/
        └── annotations/
        	├── instances_train2017.json
        	└── instances_val2017.json

Create ImageNet Selective Search boxes:

Download the precomputed ImageNet boxes and extract in the cache folder:

mkdir -p /cache/ilsvrc && cd /cache/ilsvrc 
wget https://github.com/amirbar/DETReg/releases/download/1.0.0/ss_box_cache.tar.gz
tar -xf ss_box_cache.tar.gz

Alternatively, you can compute Selective Search boxes yourself:

To create selective search boxes for ImageNet100 on a single machine, run the following command (set num_processes):

python -m datasets.cache_ss --dataset imagenet100 --part 0 --num_m 1 --num_p <num_processes_to_use> 

To speed up the creation of boxes, change the arguments accordingly and run the following command on each different machine:

python -m datasets.cache_ss --dataset imagenet100 --part <machine_number> --num_m <num_machines> --num_p <num_processes_to_use> 

The cached boxes are saved in the following structure:

code_root/
└── cache/
    └── ilsvrc/

Training

The command for pretraining DETReg on 8 GPUs on ImageNet100 is as following:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/DETReg_top30_in100.sh --batch_size 24 --num_workers 8

Training takes around 1.5 days with 8 NVIDIA V100 GPUs, you can download a pretrained model (see below) if you want to skip this step.

After pretraining, a checkpoint is saved in exps/DETReg_top30_in100/checkpoint.pth. To fine tune it over different coco settings use the following commands: Fine tuning on full COCO (should take 2 days with 8 NVIDIA V100 GPUs):

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/DETReg_fine_tune_full_coco.sh

For smaller subsets which trains faster, you can use smaller number of gpus (e.g 4 with batch size 2)/ Fine tuning on 1%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_1pct_coco.sh --batch_size 2

Fine tuning on 2%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_2pct_coco.sh --batch_size 2

Fine tuning on 5%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_5pct_coco.sh --batch_size 2

Fine tuning on 10%

GPUS_PER_NODE=4 ./tools/run_dist_launch.sh 4 ./configs/DETReg_fine_tune_10pct_coco.sh --batch_size 2

Evaluation

To evaluate a finetuned model, use the following command from the project basedir:

./configs/<config file>.sh --resume exps/<config file>/checkpoint.pth --eval

Pretrained Models

Cite

If you found this code helpful, feel free to cite our work:

@misc{bar2021detreg,
      title={DETReg: Unsupervised Pretraining with Region Priors for Object Detection},
      author={Amir Bar and Xin Wang and Vadim Kantorov and Colorado J Reed and Roei Herzig and Gal Chechik and Anna Rohrbach and Trevor Darrell and Amir Globerson},
      year={2021},
      eprint={2106.04550},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Related Works

If you found DETReg useful, consider checking out these related works as well: ReSim, SwAV, DETR, UP-DETR, and Deformable DETR.

Acknowlegments

DETReg builds on previous works code base such as Deformable DETR and UP-DETR. If you found DETReg useful please consider citing these works as well.

Comments
  • Question about reproducing the Semi-supervised Learning experiment

    Question about reproducing the Semi-supervised Learning experiment

    When i using this checkpoint as pretrain

    image

    and using these script to reproducing the Semi-supervised Learning experiment

    image

    the result turns out to be huge difference :

    image

    Please help me, did i missing anything in reproducing ?

    By the way, i can reproduce the full COCO result @45.5AP. So the conda env is probably right.

    opened by 4-0-4-notfound 5
  • Question about selective search cached boxes in training and validation

    Question about selective search cached boxes in training and validation

    Why are there some '.npy' files for the Imagnet validation set in 'ss_box_cache.tar.gz', for example ILSVRC2012_val_00000006.npy. Are training sets and validation sets used for pretraining?

    opened by CQIITLAB 3
  • RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3

    RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3

    Hi, I'm trying to run the pretraining but I receive a mismatch size here https://github.com/amirbar/DETReg/blob/main/models/deformable_detr.py#L328, src_features has a shape of torch.Size([228, 512]) and target_features a shape of torch.Size([228, 3, 128, 128]). Is this ok?

    Start training /home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) /home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) /home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py:329: UserWarning: Using a target size (torch.Size([228, 3, 128, 128])) that is different to the input size (torch.Size([228, 512])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return {'object_embedding_loss': torch.nn.functional.l1_loss(src_features, target_features, reduction='mean')} Traceback (most recent call last): File "main.py", line 403, in main(args) File "main.py", line 314, in main model, swav_model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm) File "/home/jossalgon/notebooks/unsupervised/DETReg/engine.py", line 50, in train_one_epoch loss_dict = criterion(outputs, targets) File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 406, in forward losses.update(self.get_loss(loss, outputs, targets, indices, num_boxes, **kwargs)) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 381, in get_loss return loss_map[loss](outputs, targets, indices, num_boxes, **kwargs) File "/home/jossalgon/notebooks/unsupervised/DETReg/models/deformable_detr.py", line 329, in loss_object_embedding_loss return {'object_embedding_loss': torch.nn.functional.l1_loss(src_features, target_features, reduction='mean')} File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/nn/functional.py", line 3058, in l1_loss expanded_input, expanded_target = torch.broadcast_tensors(input, target) File "/home/jossalgon/my-envs/detreg/lib/python3.7/site-packages/torch/functional.py", line 73, in broadcast_tensors return _VF.broadcast_tensors(tensors) # type: ignore[attr-defined] RuntimeError: The size of tensor a (512) must match the size of tensor b (128) at non-singleton dimension 3 Traceback (most recent call last): File "./tools/launch.py", line 192, in main() File "./tools/launch.py", line 188, in main cmd=process.args)

    Using: cudatoolkit 11.1.74 h6bb024c_0 nvidia/linux-64 pytorch 1.9.0 py3.7_cuda11.1_cudnn8.0.5_0 pytorch/linux-64 torchaudio 0.9.0 py37 pytorch/linux-64 torchvision 0.10.0 py37_cu111 pytorch/linux-64

    Thanks and great work!

    opened by jossalgon 3
  • error occured when following the Compiling CUDA operators step.

    error occured when following the Compiling CUDA operators step.

    Hello, when I try to run sh ./make.sh by following the Compiling CUDA operators it always show the error, Traceback (most recent call last): File "setup.py", line 69, in ext_modules=get_extensions(), File "setup.py", line 47, in get_extensions raise NotImplementedError('Cuda is not availabel') NotImplementedError: Cuda is not availabel

    Any idea why this happens? Thanks!

    opened by ruizhaoz 2
  • Can`t install MultiScaleDeformableAttention

    Can`t install MultiScaleDeformableAttention

    Hi, I have seen others have resolved this, but no clues are left behind. I was not able to install the MultiScaleDeformableAttention package from pip or conda, and there is nothing in Readme.

    Please assist. Thank you!

    opened by jshtok 2
  • Results between IN100 and IN1k setting

    Results between IN100 and IN1k setting

    In the arXiv v1 version, the fine-tune result on COCO is 45.5 with IN100 pretrain. But in the arXiv v2 version, it seems the fine-tune result on COCO is still 45.5, but the pretrain dataset is IN1k. So, in my understanding, with more pretrain data, but the fine-tune result is not improved?

    opened by 4-0-4-notfound 2
  • Fine Tuning the Model on a fraction of VOC

    Fine Tuning the Model on a fraction of VOC

    Hi @amirbar,

    Thank You for the great work. It looks like the parameter --filter_pct has never been used in the code. It means the code effectively running fine-tuning on whole VOC/COCO datasets. Please correct me if I am wrong.

    Thanks

    opened by mmaaz60 2
  • It seems in the pretrain stage the network output 90 categories instead of 2

    It seems in the pretrain stage the network output 90 categories instead of 2

    Hello, It seems the network output 90 categories instead of 2, in the pretrain stage. In the paper, it supposes to output 2 categories (either back gourd or foreground), which is not true in the code. I'm so confused, Am i missing something?

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/configs/DETReg_top30_in100.sh#L8

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/main.py#L120

    https://github.com/amirbar/DETReg/blob/490e40403860d51c19333b5db53bcd0ee23647ad/models/deformable_detr.py#L497-L503

    opened by 4-0-4-notfound 2
  • Pretrained model on ImageNet-1K

    Pretrained model on ImageNet-1K

    Hi, Thank you for sharing your great work. I am conducting a study on the features of DETReg, and wanted to explore the performance with the pretrained model trained on the full ImageNet. I was wondering if you could share an ImageNet-1K pretrained model?

    Thank you

    opened by hanoonaR 2
  • Bug: Target[

    Bug: Target["area"] incorrect when using selective_search (and possibly others)

    The selective_search function changes the boxes to xyxy coordinates. boxes[..., 2] = boxes[..., 0] + boxes[..., 2] boxes[..., 3] = boxes[..., 1] + boxes[..., 3]

    In [get_item] (https://github.com/amirbar/DETReg/blob/36ae5844183499f6bc1a6d8922427b0f473e06d9/datasets/selfdet.py#L67)
    we have boxes = selective_search(img, h, w, res_size=128) ... target['boxes'] = torch.tensor(boxes) ... target['area'] = target['boxes'][..., 2] * target['boxes'][..., 3]

    But boxes at this point on in xyxy not cxcywh, So the "area" is incorrect. I do not know if this effects anything down the line, it may not.

    opened by AZaitzeff 1
  • What is the difference between 'head' and 'intermediate' in 'obj_embedding_head'?

    What is the difference between 'head' and 'intermediate' in 'obj_embedding_head'?

    https://github.com/amirbar/DETReg/blob/0a258d879d8981b27ab032b83defc6dfcbf07d35/models/backbone.py#L156-L177

    It seems 'head' is the new training setting that uses dim=128 to align features. But dim=512 ('intermediate') is used in the paper. Does it mean that we should change to dim=128 ('head') to achieve better performance of DETReg?

    Thanks.

    opened by Cohesion97 1
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Fine-tuning based on the DETR architecture code, but the verification indicators are all 0

    Fine-tuning based on the DETR architecture code, but the verification indicators are all 0

    Thanks for your work. I noticed that you open-sourced the detreg of the DETR architecture, and then I tried to use the pre-trained model on the imagenet dataset you provided to fine-tune training for my custom dataset. But I found that all the indicators are still 0 after more than fifty batches of pre-training. I have followed the tips in the related issues of DETR (https://github.com/facebookresearch/detr/issues?page=1&q=zero) , the num_calss was modified. Many people mentioned that DETR requires a large amount of training data, or fine-tuning. But I am currently using fine-tuning, and the number of fine-tuning datasets is about one thousand. But the effect is still very poor, may I ask why. It's normal for me to use deformable-detr architecture. image

    opened by Flyooofly 0
  •   checkpoint_args = torch.load(args.resume, map_location='cpu')['args'] KeyError: 'args'???

    checkpoint_args = torch.load(args.resume, map_location='cpu')['args'] KeyError: 'args'???

    1: checkpoint_args = torch.load(args.resume,map_location='cpu')['args'] KeyError: 'args' I have this kind of error report in the evaluation stage, I don't know how to deal with it, I hope the owner can help me to solve it, thank you very much. 2: Does the test effect of a single GPU appear to be reduced?

    opened by 873552584 0
  • About few-shot object detection

    About few-shot object detection

    I found the result of few-shot object detection is better than others, could you release the few-shot object detection code? or hyperparameters? or how to import novel and base datasets? thanks :)

    opened by YAOSL98 0
  • urllib.error.HTTPError: HTTP Error 403: Forbidden

    urllib.error.HTTPError: HTTP Error 403: Forbidden

    Downloading: "https://dl.fbaipublicfiles.com/deepcluster/swav_800ep_pretrain.pth.tar" to C:\Users\pc/.cache\torch\hub\checkpoints\swav_800ep_pretrain.pth.tar urllib.error.HTTPError: HTTP Error 403: Forbidden hope to solve,thanks

    opened by GDzhu01 0
Releases(1.0.0)
Let's create a tool to convert Thailand budget from PDF to CSV.

thailand-budget-pdf2csv Let's create a tool to convert Thailand Government Budgeting from PDF to CSV! รวมพลัง Dev แปลงงบ จาก PDF สู่ Machine-readable

Kao.Geek 88 Dec 19, 2022
code release for USENIX'22 paper `On the Security Risks of AutoML`

This project is a minimized runnable project cut from trojanzoo, which contains more datasets, models, attacks and defenses. This repo will not be mai

Ren Pang 5 Apr 19, 2022
The Official PyTorch Implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 spotlight paper)

Official PyTorch implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 Spotlight Paper) Zhisheng

NVIDIA Research Projects 45 Dec 26, 2022
A Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Training Data》

RangeLoss Pytorch This is a Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Trai

Youzhi Gu 7 Nov 27, 2021
Advanced yabai wooting scripts

Yabai Wooting scripts Installation requirements Both https://github.com/xiamaz/python-yabai-client and https://github.com/xiamaz/python-wooting-rgb ne

Max Zhao 3 Dec 31, 2021
Libtorch yolov3 deepsort

Overview It is for my undergrad thesis in Tsinghua University. There are four modules in the project: Detection: YOLOv3 Tracking: SORT and DeepSORT Pr

Xu Wei 226 Dec 13, 2022
Pytorch code for semantic segmentation using ERFNet

ERFNet (PyTorch version) This code is a toolbox that uses PyTorch for training and evaluating the ERFNet architecture for semantic segmentation. For t

Edu 394 Jan 01, 2023
Official Repository for "Robust On-Policy Data Collection for Data Efficient Policy Evaluation" (NeurIPS 2021 Workshop on OfflineRL).

Robust On-Policy Data Collection for Data-Efficient Policy Evaluation Source code of Robust On-Policy Data Collection for Data-Efficient Policy Evalua

Autonomous Agents Research Group (University of Edinburgh) 2 Oct 09, 2022
SwinTrack: A Simple and Strong Baseline for Transformer Tracking

SwinTrack This is the official repo for SwinTrack. A Simple and Strong Baseline Prerequisites Environment conda (recommended) conda create -y -n SwinT

LitingLin 196 Jan 04, 2023
SOFT: Softmax-free Transformer with Linear Complexity, NeurIPS 2021 Spotlight

SOFT: Softmax-free Transformer with Linear Complexity SOFT: Softmax-free Transformer with Linear Complexity, Jiachen Lu, Jinghan Yao, Junge Zhang, Xia

Fudan Zhang Vision Group 272 Dec 25, 2022
Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022

PGNet Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022, CVPR 2022 (arXiv 2204.05041) Abstract Recent salient objec

CVTEAM 109 Dec 05, 2022
DeepMReye: magnetic resonance-based eye tracking using deep neural networks

DeepMReye: magnetic resonance-based eye tracking using deep neural networks

73 Dec 21, 2022
Synthesize photos from PhotoDNA using machine learning 🌱

Ribosome Synthesize photos from PhotoDNA. See the blog post for more information. Installation Dependencies You can install Python dependencies using

Anish Athalye 112 Nov 23, 2022
Implementation of Deep Deterministic Policy Gradiet Algorithm in Tensorflow

ddpg-aigym Deep Deterministic Policy Gradient Implementation of Deep Deterministic Policy Gradiet Algorithm (Lillicrap et al.arXiv:1509.02971.) in Ten

Steven Spielberg P 247 Dec 07, 2022
The best solution of the Weather Prediction track in the Yandex Shifts challenge

yandex-shifts-weather The repository contains information about my solution for the Weather Prediction track in the Yandex Shifts challenge https://re

Ivan Yu. Bondarenko 15 Dec 18, 2022
Tesla Light Show xLights Guide With python

Tesla Light Show xLights Guide Welcome to the Tesla Light Show xLights guide! You can create and run your own light shows on Tesla vehicles. Running a

Tesla, Inc. 2.5k Dec 29, 2022
Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors

-IEEE-TIM-2021-1-Shallow-CNN-for-HAR [IEEE TIM 2021-1] Shallow Convolutional Neural Networks for Human Activity Recognition using Wearable Sensors All

Wenbo Huang 1 May 17, 2022
Open-source code for Generic Grouping Network (GGN, CVPR 2022)

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity Pytorch implementation for "Open-World Instance Segmen

Meta Research 99 Dec 06, 2022
PyTorch(Geometric) implementation of G^2GNN in "Imbalanced Graph Classification via Graph-of-Graph Neural Networks"

This repository is an official PyTorch(Geometric) implementation of G^2GNN in "Imbalanced Graph Classification via Graph-of-Graph Neural Networks". Th

Yu Wang (Jack) 13 Nov 18, 2022
Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond

CRF - Conditional Random Fields A library for dense conditional random fields (CRFs). This is the official accompanying code for the paper Regularized

Đ.Khuê Lê-Huu 21 Nov 26, 2022