R3Det based on mmdet 2.19.0

Overview

R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object

License arXiv

Installation

# install mmdetection first if you haven't installed it yet. (Refer to mmdetection for details.)
pip install mmdet==2.19.0

# install r3det (Compiling rotated ops is a little time-consuming.)
pip install -r requirements.txt
pip install -v -e .
  • It is best to use opencv-python greater than 4.5.1 because its angle representation has been changed in 4.5.1. The following experiments are all run with 4.5.3.

Quick Start

Please change path in configs to your data path.

# train
CUDA_VISIBLE_DEVICES=0 PORT=29500 \
./tools/dist_train.sh configs/rretinanet/rretinanet_obb_r50_fpn_1x_dota_v3.py 1

# submission
CUDA_VISIBLE_DEVICES=0 PORT=29500 \
./tools/dist_test.sh configs/rretinanet/rretinanet_obb_r50_fpn_1x_dota_v3.py \
        work_dirs/rretinanet_obb_r50_fpn_1x_dota_v3/epoch_12.pth 1 --format-only\
        --eval-options submission_dir=work_dirs/rretinanet_obb_r50_fpn_1x_dota_v3/Task1_results

For DOTA dataset, please crop the original images into 1024×1024 patches with an overlap of 200 by run

python tools/split/img_split.py --base_json \
       tools/split/split_configs/split_configs/dota1_0/ss_trainval.json

python tools/split/img_split.py --base_json \
       tools/split/split_configs/dota1_0/ss_test.json

Please change path in ss_trainval.json, ss_test.json to your path. (Forked from BboxToolkit, which is faster then DOTA_Devkit.)

Angle Representations

Three angle representations are built-in, which can freely switch in the config.

  • v1 (from R3Det): [-PI/2, 0)
  • v2 (from S2ANet): [-Pi/4, 3PI/4)
  • v3 (from OBBDetection): [-PI/2, PI/2)

The differences of the three angle representations are reflected in poly2obb, obb2poly, obb2xyxy, obb2hbb, hbb2obb, etc. [More], And according to the above three papers, the coders of them are different.

  • DeltaXYWHAOBBoxCoder
    • v1:None
    • v2:Constrained angle + Projection of dx and dy + Normalized with PI
    • v3:Constrained angle and length&width + Projection of dx and dy
  • DeltaXYWHAHBBoxCoder
    • v1:None
    • v2:Constrained angle + Normalized with PI
    • v3:Constrained angle and length&width + Normalized with 2PI

We believe that different coders are the key reason for the different baselines in different papers. The good news is that all the above coders can be freely switched in R3Det. In addition, R3Det also provide 4 NMS ops and 3 IoU_Calculators for rotation detection as follows:

  • nms.type
    • v1:v1
    • v2:v2
    • v3:v3
    • mmcv: mmcv
  • iou_calculator
    • v1:RBboxOverlaps2D_v1
    • v2:RBboxOverlaps2D_v2
    • v3:RBboxOverlaps2D_v3

Performance

DOTA1.0 (Task1)
Model Backbone Lr schd MS RR Angle box AP Official Download
RRetinaNet HBB R50-FPN 1x - - v1 65.19 65.73 Baidu:0518/Google
RRetinaNet OBB R50-FPN 1x - - v3 68.20 69.40 Baidu:0518/Google
RRetinaNet OBB R50-FPN 1x - - v2 68.64 68.40 Baidu:0518/Google
R3Det R50-FPN 1x - - v1 70.41 70.66 Baidu:0518/Google
R3Det* R50-FPN 1x - - v1 70.86 - Baidu:0518/Google
  • MS means multiple scale image split.
  • RR means random rotation.

Citation

@inproceedings{yang2021r3det,
    title={R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object},
    author={Yang, Xue and Yan, Junchi and Feng, Ziming and He, Tao},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    volume={35},
    number={4},
    pages={3163--3171},
    year={2021}
}

Owner
SJTU-Thinklab-Det
SJTU-Thinklab-Det
FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

FaceQgen FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment This repository is based on the paper: "FaceQgen: Semi-Supervised D

Javier Hernandez-Ortega 3 Aug 04, 2022
Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer.

TargetCLIP- official pytorch implementation of the paper Image-Based CLIP-Guided Essence Transfer This repository finds a global direction in StyleGAN

Hila Chefer 221 Dec 13, 2022
PoseViz – Multi-person, multi-camera 3D human pose visualization tool built using Mayavi.

PoseViz – 3D Human Pose Visualizer Multi-person, multi-camera 3D human pose visualization tool built using Mayavi. As used in MeTRAbs visualizations.

István Sárándi 79 Dec 30, 2022
T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time The first Lidar-only odometry framework with high performance based on tr

Pengwei Zhou 183 Dec 01, 2022
NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size

NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size Xuanyi Dong, Lu Liu, Katarzyna Musial, Bogdan Gabrys in IEEE Transactions o

D-X-Y 137 Dec 20, 2022
Apache Spark - A unified analytics engine for large-scale data processing

Apache Spark Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an op

The Apache Software Foundation 34.7k Jan 04, 2023
Code for the paper "Query Embedding on Hyper-relational Knowledge Graphs"

Query Embedding on Hyper-Relational Knowledge Graphs This repository contains the code used for the experiments in the paper Query Embedding on Hyper-

DimitrisAlivas 19 Jul 26, 2022
Streamlit tool to explore coco datasets

What is this This tool given a COCO annotations file and COCO predictions file will let you explore your dataset, visualize results and calculate impo

Jakub Cieslik 75 Dec 16, 2022
Repository for XLM-T, a framework for evaluating multilingual language models on Twitter data

This is the XLM-T repository, which includes data, code and pre-trained multilingual language models for Twitter. XLM-T - A Multilingual Language Mode

Cardiff NLP 112 Dec 27, 2022
The Codebase for Causal Distillation for Language Models.

Causal Distillation for Language Models Zhengxuan Wu*,Atticus Geiger*, Josh Rozner, Elisa Kreiss, Hanson Lu, Thomas Icard, Christopher Potts, Noah D.

Zen 20 Dec 31, 2022
Creating multimodal multitask models

Fusion Brain Challenge The English version of the document can be found here. Обновления 01.11 Мы выкладываем пример данных, аналогичных private test

Sber AI 43 Nov 28, 2022
Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

One-Shot Voice Conversion with Weight Adaptive Instance Normalization By Shengjie Huang, Yanyan Xu*, Dengfeng Ke*, Mingjie Chen, Thomas Hain. This rep

31 Dec 07, 2022
SHRIMP: Sparser Random Feature Models via Iterative Magnitude Pruning

SHRIMP: Sparser Random Feature Models via Iterative Magnitude Pruning This repository is the official implementation of "SHRIMP: Sparser Random Featur

Bobby Shi 0 Dec 16, 2021
🇰🇷 Text to Image in Korean

KoDALLE Utilizing pretrained language model’s token embedding layer and position embedding layer as DALLE’s text encoder. Background Training DALLE mo

HappyFace 74 Sep 22, 2022
Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness Code for Paper "Imbalanced Gradients: A Subtle Cause of Overestimated Adv

Hanxun Huang 11 Nov 30, 2022
labelpix is a graphical image labeling interface for drawing bounding boxes

Welcome to labelpix 👋 labelpix is a graphical image labeling interface for drawing bounding boxes. 🏠 Homepage Install pip install -r requirements.tx

schissmantics 26 May 24, 2022
Custom TensorFlow2 implementations of forward and backward computation of soft-DTW algorithm in batch mode.

Batch Soft-DTW(Dynamic Time Warping) in TensorFlow2 including forward and backward computation Custom TensorFlow2 implementations of forward and backw

19 Aug 30, 2022
A TensorFlow 2.x implementation of Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders Are Scalable Vision Learners A TensorFlow implementation of Masked Autoencoders Are Scalable Vision Learners [1]. Our implementati

Aritra Roy Gosthipaty 59 Dec 10, 2022
This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.

WTW-Dataset This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on ICCV 2021. Here, you can download the

109 Dec 29, 2022
This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in the Wild"

Visual Attributes in the Wild (VAW) This repository provides data for the VAW dataset as described in the CVPR 2021 Paper: Learning to Predict Visual

Adobe Research 36 Dec 30, 2022