Location-Sensitive Visual Recognition with Cross-IOU Loss

Overview

The trained models are temporarily unavailable, but you can train the code using reasonable computational resource.

Location-Sensitive Visual Recognition with Cross-IOU Loss

by Kaiwen Duan, Lingxi Xie, Honggang Qi, Song Bai, Qingming Huang and Qi Tian

The code to train and evaluate the proposed LSNet is available here. For more technical details, please refer to our arXiv paper.

The location-sensitive visual recognition tasks, including object detection, instance segmentation, and human pose estimation, can be formulated into localizing an anchor point (in red) and a set of landmarks (in green). Our work aims to offer a unified framework for these tasks.

Abstract

Object detection, instance segmentation, and pose estimation are popular visual recognition tasks which require localizing the object by internal or boundary landmarks. This paper summarizes these tasks as location-sensitive visual recognition and proposes a unified solution named location-sensitive network (LSNet). Based on a deep neural network as the backbone, LSNet predicts an anchor point and a set of landmarks which together define the shape of the target object. The key to optimizing the LSNet lies in the ability of fitting various scales, for which we design a novel loss function named cross-IOU loss that computes the cross-IOU of each anchor-landmark pair to approximate the global IOU between the prediction and groundtruth. The flexibly located and accurately predicted landmarks also enable LSNet to incorporate richer contextual information for visual recognition. Evaluated on the MSCOCO dataset, LSNet set the new state-of-the-art accuracy for anchor-free object detection (a 53.5% box AP) and instance segmentation (a 40.2% mask AP), and shows promising performance in detecting multi-scale human poses.

If you encounter any problems in using our code, please contact Kaiwen Duan: [email protected]

Bbox AP(%) on COCO test-dev

Method Backbone epoch MStrain AP AP50 AP75 APS APM APL
Anchor-based:
Libra R-CNN X-101-64x4d 12 N 43.0 64.0 47.0 25.3 45.6 54.6
AB+FSAF* X-101-64x4d 18 Y 44.6 65.2 48.6 29.7 47.1 54.6
FreeAnchor* X-101-32x8d 24 Y 47.3 66.3 51.5 30.6 50.4 59.0
GFLV1* X-101-32x8d 24 Y 48.2 67.4 52.6 29.2 51.7 60.2
ATSS* X-101-64x4d-DCN 24 Y 50.7 68.9 56.3 33.2 52.9 62.4
PAA* X-101-64x4d-DCN 24 Y 51.4 69.7 57.0 34.0 53.8 64.0
GFLV2* R2-101-DCN 24 Y 53.3 70.9 59.2 35.7 56.1 65.6
YOLOv4-P7* CSP-P7 450 Y 56.0 73.3 61.2 38.9 60.0 68.6
Anchor-free:
ExtremeNet* HG-104 200 Y 43.2 59.8 46.4 24.1 46.0 57.1
RepPointsV1* R-101-DCN 24 Y 46.5 67.4 50.9 30.3 49.7 57.1
SAPD X-101-64x4d-DCN 24 Y 47.4 67.4 51.1 28.1 50.3 61.5
CornerNet* HG-104 200 Y 42.1 57.8 45.3 20.8 44.8 56.7
DETR R-101 500 Y 44.9 64.7 47.7 23.7 49.5 62.3
CenterNet* HG-104 190 Y 47.0 64.5 50.7 28.9 49.9 58.9
CPNDet* HG-104 100 Y 49.2 67.4 53.7 31.0 51.9 62.4
BorderDet* X-101-64x4d-DCN 24 Y 50.3 68.9 55.2 32.8 52.8 62.3
FCOS-BiFPN X-101-32x8-DCN 24 Y 50.4 68.9 55.0 33.2 53.0 62.7
RepPointsV2* X-101-64x4d-DCN 24 Y 52.1 70.1 57.5 34.5 54.6 63.6
LSNet R-50 24 Y 44.8 64.1 48.8 26.6 47.7 55.7
LSNet X-101-64x4d 24 Y 48.2 67.6 52.6 29.6 51.3 60.5
LSNet X-101-64x4d-DCN 24 Y 49.6 69.0 54.1 30.3 52.8 62.8
LSNet-CPV X-101-64x4d-DCN 24 Y 50.4 69.4 54.5 31.0 53.3 64.0
LSNet-CPV R2-101-DCN 24 Y 51.1 70.3 55.2 31.2 54.3 65.0
LSNet-CPV* R2-101-DCN 24 Y 53.5 71.1 59.2 35.2 56.4 65.8

A comparison between LSNet and the sate-of-the-art methods in object detection on the MS-COCO test-dev set. LSNet surpasses all competitors in the anchor-free group. The abbreviations are: ‘R’ – ResNet, ‘X’ – ResNeXt, ‘HG’ – Hourglass network, ‘R2’ – Res2Net, ‘CPV’ – corner point verification, ‘MStrain’ – multi-scale training, * – multi-scale testing.

Segm AP(%) on COCO test-dev

Method Backbone epoch AP AP50 AP75 APS APM APL
Pixel-based:
YOLACT R-101 48 31.2 50.6 32.8 12.1 33.3 47.1
TensorMask R-101 72 37.1 59.3 39.4 17.1 39.1 51.6
Mask R-CNN X-101-32x4d 12 37.1 60.0 39.4 16.9 39.9 53.5
HTC X-101-64x4d 20 41.2 63.9 44.7 22.8 43.9 54.6
DetectoRS* X-101-64x4d 40 48.5 72.0 53.3 31.6 50.9 61.5
Contour-based:
ExtremeNet HG-104 100 18.9 44.5 13.7 10.4 20.4 28.3
DeepSnake DLA-34 120 30.3 - - - - -
PolarMask X-101-64x4d-DCN 24 36.2 59.4 37.7 17.8 37.7 51.5
LSNet X-101-64x4d-DCN 30 37.6 64.0 38.3 22.1 39.9 49.1
LSNet R2-101-DCN 30 38.0 64.6 39.0 22.4 40.6 49.2
LSNet* X-101-64x4d-DCN 30 39.7 65.5 41.3 25.5 41.3 50.4
LSNet* R2-101-DCN 30 40.2 66.2 42.1 25.8 42.2 51.0

Comparison of LSNet to the sate-of-the-art methods in instance segmentation task on the COCO test-dev set. Our LSNet achieves the state-of-the-art accuracy for contour-based instance segmentation. ‘R’ - ResNet, ‘X’ - ResNeXt, ‘HG’ - Hourglass, ‘R2’ - Res2Net, * - multi-scale testing.

Keypoints AP(%) on COCO test-dev

Method Backbone epoch AP AP50 AP75 APM APL
Heatmap-based:
CenterNet-jd DLA-34 320 57.9 84.7 63.1 52.5 67.4
OpenPose VGG-19 - 61.8 84.9 67.5 58.0 70.4
Pose-AE HG 300 62.8 84.6 69.2 57.5 70.6
CenterNet-jd HG104 150 63.0 86.8 69.6 58.9 70.4
Mask R-CNN R-50 28 63.1 87.3 68.7 57.8 71.4
PersonLab R-152 >1000 66.5 85.5 71.3 62.3 70.0
HRNet HRNet-W32 210 74.9 92.5 82.8 71.3 80.9
Regression-based:
CenterNet-reg [66] DLA-34 320 51.7 81.4 55.2 44.6 63.0
CenterNet-reg [66] HG-104 150 55.0 83.5 59.7 49.4 64.0
LSNet w/ obj-box X-101-64x4d-DCN 60 55.7 81.3 61.0 52.9 60.5
LSNet w/ kps-box X-101-64x4d-DCN 20 59.0 83.6 65.2 53.3 67.9

Comparison of LSNet to the sate-of-the-art methods in pose estimation task on the COCO test-dev set. LSNet predict the keypoints by regression. ‘obj-box’ and ‘kps-box’ denote the object bounding boxes and the keypoint-boxes, respectively. For LSNet w/ kps-box, we fine-tune the model from the LSNet w/ kps-box for another 20 epochs.

Visualization

Some location-sensitive visual recognition results on the MS-COCO validation set.

We compared with the CenterNet to show that our LSNet w/ ‘obj-box’ tends to predict more human pose of small scales, which are not annotated on the dataset. Only pose results with scores higher than 0:3 are shown for both methods.

Left: LSNet uses the object bounding boxes to assign training samples. Right: LSNet uses the keypoint-boxes to assign training samples. Although LSNet with keypoint-boxes enjoys higher AP score, its ability of perceiving multi-scale human instances is weakened.

Preparation

The master branch works with PyTorch 1.5.0

The dataset directory should be like this:

├── data
│   ├── coco
│   │   ├── annotations
│   │   ├── images
            ├── train2017
            ├── val2017
            ├── test2017

Generate extreme point annotation from segmentation:

  • cd code/tools
  • python gen_coco_lsvr.py
  • cd ..

Installation

1. Installing cocoapi
  • cd cocoapi/pycocotools
  • python setup.py develop
  • cd ../..
2. Installing mmcv
  • cd mmcv
  • pip install -e.
  • cd ..
3. Installing mmdet
  • python setup.py develop

Training and Evaluation

Our LSNet is based on mmdetection. Please check with existing dataset for Training and Evaluation.

Owner
Kaiwen Duan
Kaiwen Duan
Dynamic Realtime Animation Control

Our project is targeted at making an application that dynamically detects the user’s expressions and gestures and projects it onto an animation software which then renders a 2D/3D animation realtime

Harsh Avinash 10 Aug 01, 2022
This is an official implementation for "AS-MLP: An Axial Shifted MLP Architecture for Vision".

AS-MLP architecture for Image Classification Model Zoo Image Classification on ImageNet-1K Network Resolution Top-1 (%) Params FLOPs Throughput (image

SVIP Lab 106 Dec 12, 2022
A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

Attention Walk ⠀⠀ A PyTorch Implementation of Watch Your Step: Learning Node Embeddings via Graph Attention (NIPS 2018). Abstract Graph embedding meth

Benedek Rozemberczki 303 Dec 09, 2022
Google AI Open Images - Object Detection Track: Open Solution

Google AI Open Images - Object Detection Track: Open Solution This is an open solution to the Google AI Open Images - Object Detection Track 😃 More c

minerva.ml 46 Jun 22, 2022
Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Gyeongjae Choi 17 Sep 23, 2021
Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021)

Beanie - is an asynchronous ODM for MongoDB, based on Motor and Pydantic. It uses an abstraction over Pydantic models and Motor collections to work wi

295 Dec 29, 2022
FSL-Mate: A collection of resources for few-shot learning (FSL).

FSL-Mate is a collection of resources for few-shot learning (FSL). In particular, FSL-Mate currently contains FewShotPapers: a paper list which tracks

Yaqing Wang 1.5k Jan 08, 2023
Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

LUNAR Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks" Adam Goodge, Bryan Hooi, Ng See Kiong and

Adam Goodge 25 Dec 28, 2022
An interpreter for RASP as described in the ICML 2021 paper "Thinking Like Transformers"

RASP Setup Mac or Linux Run ./setup.sh . It will create a python3 virtual environment and install the dependencies for RASP. It will also try to insta

141 Jan 03, 2023
[NeurIPS 2021] SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

SSUL - Official Pytorch Implementation (NeurIPS 2021) SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning Sun

Clova AI Research 44 Dec 27, 2022
PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

PyTorch implementation of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition The unofficial code of CDistNet. Now, we ha

25 Jul 20, 2022
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)

Skyformer This repository is the official implementation of Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr"om Method (NeurIPS 2021).

Qi Zeng 46 Sep 20, 2022
Official repository for "Deep Recurrent Neural Network with Multi-scale Bi-directional Propagation for Video Deblurring".

RNN-MBP Deep Recurrent Neural Network with Multi-scale Bi-directional Propagation for Video Deblurring (AAAI-2022) by Chao Zhu, Hang Dong, Jinshan Pan

SIV-LAB 22 Aug 31, 2022
Deep Markov Factor Analysis (NeurIPS2021)

Deep Markov Factor Analysis (DMFA) Codes and experiments for deep Markov factor analysis (DMFA) model accepted for publication at NeurIPS2021: A. Farn

Sarah Ostadabbas 2 Dec 16, 2022
Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Attention Transfer PyTorch code for "Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Tran

Sergey Zagoruyko 1.4k Dec 23, 2022
Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

Hierarchical Memory Matching Network for Video Object Segmentation Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon Lee, Euntai Kim

Hongje Seong 72 Dec 14, 2022
📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

tensorlm Generate Shakespeare poems with 4 lines of code. Installation tensorlm is written in / for Python 3.4+ and TensorFlow 1.1+ pip3 install tenso

Kilian Batzner 63 May 22, 2021
The repo contains the code of the ACL2020 paper `Dice Loss for Data-imbalanced NLP Tasks`

Dice Loss for NLP Tasks This repository contains code for Dice Loss for Data-imbalanced NLP Tasks at ACL2020. Setup Install Package Dependencies The c

223 Dec 17, 2022
LIAO Shuiying 6 Dec 01, 2022
Simple tools for logging and visualizing, loading and training

TNT TNT is a library providing powerful dataloading, logging and visualization utilities for Python. It is closely integrated with PyTorch and is desi

1.5k Jan 02, 2023