A PyTorch version of You Only Look at One-level Feature object detector

Overview

PyTorch_YOLOF

A PyTorch version of You Only Look at One-level Feature object detector.

The input image must be resized to have their shorter side being 800 and their longer side less or equal to 1333.

During reproducing the YOLOF, I found many tricks used in YOLOF but the baseline RetinaNet dosen't use those tricks. For example, YOLOF takes advantage of RandomShift, CTR_CLAMP, large learning rate, big batchsize(like 64), negative prediction threshold. Is it really fair that YOLOF use these tricks to compare with RetinaNet?

In a other word, whether the YOLOF can still work without those tricks?

Requirements

  • We recommend you to use Anaconda to create a conda environment:
conda create -n yolof python=3.6
  • Then, activate the environment:
conda activate yolof
  • Requirements:
pip install -r requirements.txt 

PyTorch >= 1.1.0 and Torchvision >= 0.3.0

Visualize positive sample

You can run following command to visualize positiva sample:

python train.py \
        -d voc \
        --batch_size 2 \
        --root path/to/your/dataset \
        --vis_targets

My Ablation Studies

image mask

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: IoU Top4 (Different from the official matcher that uses top4 of L1 distance.)
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip

We ignore the loss of samples who are not in image.

Method AP AP50 AP75 APs APm APl
w/o mask 28.3 46.7 28.9 13.4 33.4 39.9
w mask 28.4 46.9 29.1 13.5 33.5 39.1

L1 Top4

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip
  • with image mask

IoU topk: We choose the topK of IoU between anchor boxes and labels as the positive samples.

L1 topk: We choose the topK of L1 distance between anchor boxes and labels as the positive samples.

Method AP AP50 AP75 APs APm APl
IoU Top4 28.4 46.9 29.1 13.5 33.5 39.1
L1 Top4 28.6 46.9 29.4 13.8 34.0 39.0

RandomShift Augmentation

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip
  • with image mask

YOLOF takes advantage of RandomShift augmentation which is not used in RetinaNet.

Method AP AP50 AP75 APs APm APl
w/o RandomShift 28.6 46.9 29.4 13.8 34.0 39.0
w/ RandomShift 29.0 47.3 29.8 14.2 34.2 38.9

Fix a bug in dataloader

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

I fixed a bug in dataloader. Specifically, I set the shuffle in dataloader as False ...

Method AP AP50 AP75 APs APm APl
bug 29.0 47.3 29.8 14.2 34.2 38.9
no bug 30.1 49.0 31.0 15.2 36.3 39.8

Ignore samples

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

We ignore those negative samples whose IoU with labels are higher the ignore threshold (igt).

Method AP AP50 AP75 APs APm APl
no igt 30.1 49.0 31.0 15.2 36.3 39.8
igt=0.7

Decode boxes

  • Backbone: ResNet-50
  • image size: shorter size = 800, longer size <= 1333
  • Batch size: 16
  • lr: 0.01
  • lr of backbone: 0.01
  • SGD with momentum 0.9 and weight decay 1e-4
  • Matcher: L1 Top4
  • epoch: 12 (1x schedule)
  • lr decay: 8, 11
  • augmentation: RandomFlip + RandomShift
  • with image mask

Method-1: ctr_x = x_anchor + t_x, ctr_y = y_anchor + t_y

Method-2: ctr_x = x_anchor + t_x * w_anchor, ctr_y = y_anchor + t_y * h_anchor

The Method-2 is following the operation used in YOLOF.

Method AP AP50 AP75 APs APm APl
Method-1
Method-2

Train

sh train.sh

You can change the configurations of train.sh.

If you just want to check which anchor box is assigned to the positive sample, you can run:

python train.py --cuda -d voc --batch_size 8 --vis_targets

According to your own situation, you can make necessary adjustments to the above run commands

Test

python test.py -d [select a dataset: voc or coco] \
               --cuda \
               -v [select a model] \
               --weight [ Please input the path to model dir. ] \
               --img_size 800 \
               --root path/to/dataset/ \
               --show

You can run the above command to visualize the detection results on the dataset.

You might also like...
Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling
Stacked Hourglass Network with a Multi-level Attention Mechanism: Where to Look for Intervertebral Disc Labeling

⚠️ ‎‎‎ A more recent and actively-maintained version of this code is available in ivadomed Stacked Hourglass Network with a Multi-level Attention Mech

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks
implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

YOLOR implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks To reproduce the results in the paper, please us

 You Only 👀 One Sequence
You Only 👀 One Sequence

You Only 👀 One Sequence TL;DR: We study the transferability of the vanilla ViT pre-trained on mid-sized ImageNet-1k to the more challenging COCO obje

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR: Detector-Free Local Feature Matching with Transformers Project Page | Paper LoFTR: Detector-Free Local Feature Matching with Transformers Jiami

LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

LoFTR-with-train-script LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021 (with train script --- unofficial ---). About Megadepth

A Pytorch Implementation of [Source data‐free domain adaptation of object detector through domain

A Pytorch Implementation of Source data‐free domain adaptation of object detector through domain‐specific perturbation Please follow Faster R-CNN and

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks

A Pytorch Implementation of Domain adaptation of object detector using scissor-like networks Please follow Faster R-CNN and DAF to complete the enviro

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Comments
  • fix typo

    fix typo

    When I run the eval process on VOC dataset, an error occurs:

    Traceback (most recent call last):
      File "eval.py", line 126, in <module>
        voc_test(model, data_dir, device, transform)
      File "eval.py", line 42, in voc_test
        display=True)
    TypeError: __init__() got an unexpected keyword argument 'data_root'
    

    I discovered that this was due to a typo and simply fixed it. Everything is going well now.

    opened by guohanli 1
  • 标签生成函数写得有问题

    标签生成函数写得有问题

    源码中的标签生成逻辑是: 1.利用预测框与gt的l1距离筛选出topk个锚点,再利用锚点与gt的l1距离筛选出topk个锚点,将之作为预选正例锚点。 2.将预选正例锚点依据iou与gt匹配,滤除与锚点iou小于0.15的预选正例锚点 3.将gt与预测框iou<=0.7的预测框对应锚点设置为负例锚点 (而您只用了锚点,没有预选,也没用预测框)

    opened by Mr-Z-NewStar 11
Owner
Jianhua Yang
I love anime!!I love ACG!! The universe is so big,I want to fly and wander.
Jianhua Yang
Kaggle: Cell Instance Segmentation

Kaggle: Cell Instance Segmentation The goal of this challenge is to detect cells in microscope images. with simple view on how many cels have been ann

Jirka Borovec 9 Aug 12, 2022
Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (CVAMD)

Is it Time to Replace CNNs with Transformers for Medical Images? Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (C

Christos Matsoukas 80 Dec 27, 2022
Pytorch implementation of Cut-Thumbnail in the paper Cut-Thumbnail:A Novel Data Augmentation for Convolutional Neural Network.

Cut-Thumbnail (Accepted at ACM MULTIMEDIA 2021) Tianshu Xie, Xuan Cheng, Xiaomin Wang, Minghui Liu, Jiali Deng, Tao Zhou, Ming Liu This is the officia

3 Apr 12, 2022
CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

CharacterGAN Implementation of the paper "CharacterGAN: Few-Shot Keypoint Character Animation and Reposing" by Tobias Hinz, Matthew Fisher, Oliver Wan

Tobias Hinz 181 Dec 27, 2022
Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper

Divide and Remaster Utility Tools Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper The DnR d

Darius Petermann 46 Dec 11, 2022
Machine Learning Privacy Meter: A tool to quantify the privacy risks of machine learning models with respect to inference attacks, notably membership inference attacks

ML Privacy Meter Machine learning is playing a central role in automated decision making in a wide range of organization and service providers. The da

Data Privacy and Trustworthy Machine Learning Research Lab 357 Jan 06, 2023
SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

SalGAN: Visual Saliency Prediction with Adversarial Networks Junting Pan Cristian Canton Ferrer Kevin McGuinness Noel O'Connor Jordi Torres Elisa Sayr

Image Processing Group - BarcelonaTECH - UPC 347 Nov 22, 2022
Code for pre-training CharacterBERT models (as well as BERT models).

Pre-training CharacterBERT (and BERT) This is a repository for pre-training BERT and CharacterBERT. DISCLAIMER: The code was largely adapted from an o

Hicham EL BOUKKOURI 31 Dec 05, 2022
This is a demo app to be used in the video streaming applications

MoViDNN: A Mobile Platform for Evaluating Video Quality Enhancement with Deep Neural Networks MoViDNN is an Android application that can be used to ev

ATHENA Christian Doppler (CD) Laboratory 7 Jul 21, 2022
TAPEX: Table Pre-training via Learning a Neural SQL Executor

TAPEX: Table Pre-training via Learning a Neural SQL Executor The official repository which contains the code and pre-trained models for our paper TAPE

Microsoft 157 Dec 28, 2022
Source code for The Power of Many: A Physarum Swarm Steiner Tree Algorithm

Physarum-Swarm-Steiner-Algo Source code for The Power of Many: A Physarum Steiner Tree Algorithm Code implements ideas from the following papers: Sher

Sheryl Hsu 2 Mar 28, 2022
A simplified framework and utilities for PyTorch

Here is Poutyne. Poutyne is a simplified framework for PyTorch and handles much of the boilerplating code needed to train neural networks. Use Poutyne

GRAAL/GRAIL 534 Dec 17, 2022
Code for BMVC2021 paper "Boundary Guided Context Aggregation for Semantic Segmentation"

Boundary-Guided-Context-Aggregation Boundary Guided Context Aggregation for Semantic Segmentation Haoxiang Ma, Hongyu Yang, Di Huang In BMVC'2021 Pape

Haoxiang Ma 31 Jan 08, 2023
A general, feasible, and extensible framework for classification tasks.

Pytorch Classification A general, feasible and extensible framework for 2D image classification. Features Easy to configure (model, hyperparameters) T

Eugene 26 Nov 22, 2022
Hypersearch weight debugging and losses tutorial

tutorial Activate tensorboard option Running TensorBoard remotely When working on a remote server, you can use SSH tunneling to forward the port of th

1 Dec 11, 2021
This repository contains the code for the paper Neural RGB-D Surface Reconstruction

Neural RGB-D Surface Reconstruction Paper | Project Page | Video Neural RGB-D Surface Reconstruction Dejan Azinović, Ricardo Martin-Brualla, Dan B Gol

Dejan 406 Jan 04, 2023
Learning to See by Looking at Noise

Learning to See by Looking at Noise This is the official implementation of Learning to See by Looking at Noise. In this work, we investigate a suite o

Manel Baradad Jurjo 82 Dec 24, 2022
Official repository for "Orthogonal Projection Loss" (ICCV'21)

Orthogonal Projection Loss (ICCV'21) Kanchana Ranasinghe, Muzammal Naseer, Munawar Hayat, Salman Khan, & Fahad Shahbaz Khan Paper Link | Project Page

Kanchana Ranasinghe 83 Dec 26, 2022
Measures input lag without dedicated hardware, performing motion detection on recorded or live video

What is InputLagTimer? This tool can measure input lag by analyzing a video where both the game controller and the game screen can be seen on a webcam

Bruno Gonzalez 4 Aug 18, 2022
SysWhispers Shellcode Loader

Shhhloader Shhhloader is a SysWhispers Shellcode Loader that is currently a Work in Progress. It takes raw shellcode as input and compiles a C++ stub

icyguider 630 Jan 03, 2023