NeurIPS 2021 Datasets and Benchmarks Track

Related tags

Deep LearningAP-10K
Overview

AP-10K: A Benchmark for Animal Pose Estimation in the Wild

Introduction | Updates | Overview | Download | Training Code | Key Questions | License

Introduction

This repository is the official reporisity of AP-10K: A Benchmark for Animal Pose Estimation in the Wild (NeurIPS 2021 Datasets and Benchmarks Track). It contains the introduction, annotation files, and code for the dataset AP-10K, which is the first large-scale dataset for general animal pose estimation. AP-10K consists of 10,015 images collected and filtered from 23 animal families and 54 species, with high-quality keypoint annotations. We also contain another about 50k images with family and species labels. The dataset can be used for supervised learning, cross-domain transfer learning, and intra- and inter-family domain. It can also be used in self-supervised learning, semi-supervised learning, etc. The annotation files are provided following the COCO style.

Updates

01/11/2021 We have uploaded the corresponding code and pretrained models for the usage of AP-10K dataset!

01/11/2021 We have updated the dataset! It now has 54 species for training!

01/11/2021 The AP-10K dataset is integrated into mmpose! Please enjoy it!

11/10/2021 The paper is accepted to NeurIPS 2021 Datasets and Benchmarks Track!

31/08/2021 The paper is post on arxiv! We have uploaded the annotation file!

Overview

keypoint definition

Keypoint Description Keypoint Description
1 Left Eye 2 Right Eye
3 Nose 4 Neck
5 Root of Tail 6 Left Shoulder
7 Left Elbow 8 Left Front Paw
9 Right Shoulder 10 Right Elbow
11 Right Front Paw 12 Left Hip
13 Left Knee 14 Left Back Paw
15 Right Hip 16 Right Knee
17 Right Back Paw

Annotations Overview

Image Background

Id Background type Id Background type
1 grass or savanna 2 forest or shrub
3 mud or rock 4 snowfield
5 zoo or human habitation 6 swamp or rivderside
7 desert or gobi 8 mugshot

Download

The dataset and corresponding files can be downloaded from

[Google Drive] [Baidu Pan] (code: 6uz6)

(Optional) The full version with both labeled and unlabeled images can be downloaded with the script provided here

[Google Drive] [Baidu Pan] (code: 5lxi)

Training Code

Here we provide the example of training models with the AP-10K dataset. The code is based on the mmpose project.

Installation

Please refer to install.md for Installation.

Dataset Preparation

Please download the dataset from the Download Section, and please extract the dataset under the data folder, e.g.,

mkdir data
unzip ap-10k.zip -d data/
mv data/ap-10k data/ap10k

The extracted dataset should be looked like:

AP-10K
├── mmpose
├── docs
├── tests
├── tools
├── configs
|── data
    │── ap10k
        │-- annotations
        │   │-- ap10k-train-split1.json
        │   |-- ap10k-train-split2.json
        │   |-- ap10k-train-split3.json
        │   │-- ap10k-val-split1.json
        │   |-- ap10k-val-split2.json
        │   |-- ap10k-val-split3.json
        │   |-- ap10k-test-split1.json
        │   |-- ap10k-test-split2.json
        │   |-- ap10k-test-split3.json
        │-- data
        │   │-- 000000000001.jpg
        │   │-- 000000000002.jpg
        │   │-- ...

Inference

The checkpoints can be downloaded from HRNet-w32, HRNet-w48, ResNet-50, ResNet-101.

python tools/test.py <CONFIG_FILE> <DET_CHECKPOINT_FILE>

Training

bash tools/dist_train.sh <CONFIG_FILE> <GPU_NUM>

For example, to train the HRNet-w32 model with 1 GPU, please run:

bash tools/dist_train.sh configs/animal/2d_kpt_sview_rgb_img/topdown_heatmap/ap10k/hrnet_w32_ap10k_256x256.py 1

Key Questions

1. For what purpose was the dataset created?

AP-10K is created to facilitate research in the area of animal pose estimation. It is important to study several challenging questions in the context of more training data from diverse species are available, such as:

  1. how about the performance of different representative human pose models on the animal pose estimation task?
  2. will the representation ability of a deep model benefit from training on a large-scale dataset with diverse species?
  3. how about the impact of pretraining, e.g., on the ImageNet dataset or human pose estimation dataset, in the context of the large-scale of dataset with diverse species?
  4. how about the intra and inter family generalization ability of a model trained using data from specific species or family?

However, previous datasets for animal pose estimation contain limited number of animal species. Therefore, it is impossible to study these questions using existing datasets as they contains at most 5 species, which is far from enough to get sound conclusion. By contrast, AP-10K has 23 family and 54 species and thus can help researchers to study these questions.

2. Was any cleaning of the data done?

We removed replicated images by using aHash algorithm to detect similar images and manually checking. Images with heavy occlusion and logos were removed manually. The cleaned images were categorized into diifferent species and family.

3. How were the keypoints instructed to be labeled?

Annotators first learned about the physiognomy, body structure and distribution of keypoints of the animals. Then, five images of each species were presented to annotators to annotate keypoints, which were used to assess their annotation quality. Annotators with good annotation quality were further trained on how to deal with the partial absence of the body due to occlusion and were involved in the subsequent annotation process. Annotators were asked to annotate all visible keypoints. For the occluded keypoints, they were asked to annotate keypoints whose location they could estimate based on body plan, pose, and the symmetry property of the body, where the length of occluded limbs or the location of occluded keypoints could be inferred from the visible limbs or keypoints. Other keypoints were left unlabeled.

To guarantee the annotation quality, we have adopted a sequential labeling strategy. Three rounds of cross-check and correction are conducted with both manual check and automatic check (according to specific rules, \eg, keypoints belonging to an instance are in the same bounding box) to reduce the possibility of mislabeling. To begin with, annotators labeled keypoints of each instance and submited a version-1 labels to senior well-trained annotators, and then senior well-trained annotators checked the quality of the version-1 labels and returned an error list to annotators, annotators would fix these errors according to it. Finally, annotators submited a fixed version-2 labels to senior well-trained annotator and they did the last correction to find any potential mislabeled keypoints. After all three rounds of work had been done, a release-version of dataset with high-quality labels was finished.

4. Unity of keypoint and difference of walk type

If we only follow the biology and define the keypoints by the position of the bones, the actual labeled keypoint maybe hard, even invisible for labeling and which look like inharmonious with animal’s movement. Ungulates (or other unguligrade animals) mainly rely on their toes in movement, with their paws, ankles, and knees observable. Compared with these keypoints, the actual hips are less distinctive and difficult to annotate since they are hidden in their body. A similar phenomenon can also be observed in digitigrade animals. On the other hand, plantigrade animals always walk with metatarsals (paws) flat on the ground, with their paws, knees, and hips more distinguishable in movement. Thus, we denote the paws, ankles, and knees for the unguligrade and digitigrade animals, and the paws, knees, and hips for the plantigrade animals. For simplicity, we use 'hip' to denote the knees for unguligrade and digitigrade animals and 'knee' for their ankles. For plantigrade animals, the annotation is the same as the biology definition. Thus, the visual distribution of keypoints is similar across the dataset, as the 'knee' is around the middle of the limbs for all animals.

5. What tasks could the dataset be used for?

AP-10K can be used for the research of animal pose estimation. Besides, it can also be used for specific machine learning topics such as few-shot learning, domain generalization, self-supervised learning. Please see the Discussion part in the paper.

License

The dataset follows CC-BY-4.0 license.

Owner
AP-10K
AP-10K
codes for "Scheduled Sampling Based on Decoding Steps for Neural Machine Translation" (long paper of EMNLP-2022)

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation (EMNLP-2021 main conference) Contents Overview Background Quick to Use Furth

Adaxry 13 Jul 25, 2022
DeepFill v1/v2 with Contextual Attention and Gated Convolution, CVPR 2018, and ICCV 2019 Oral

Generative Image Inpainting An open source framework for generative image inpainting task, with the support of Contextual Attention (CVPR 2018) and Ga

2.9k Dec 16, 2022
Code for PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

PackNet: https://arxiv.org/abs/1711.05769 Pretrained models are available here: https://uofi.box.com/s/zap2p03tnst9dfisad4u0sfupc0y1fxt Datasets in Py

Arun Mallya 216 Jan 05, 2023
CNN Based Meta-Learning for Noisy Image Classification and Template Matching

CNN Based Meta-Learning for Noisy Image Classification and Template Matching Introduction This master thesis used a few-shot meta learning approach to

Kumar Manas 2 Dec 09, 2021
DeceFL: A Principled Decentralized Federated Learning Framework

DeceFL: A Principled Decentralized Federated Learning Framework This repository comprises codes that reproduce experiments in Ye, et al (2021), which

Huazhong Artificial Intelligence Lab (HAIL) 10 May 31, 2022
Code for paper 'Hand-Object Contact Consistency Reasoning for Human Grasps Generation' at ICCV 2021

GraspTTA Hand-Object Contact Consistency Reasoning for Human Grasps Generation (ICCV 2021). Project Page with Videos Demo Quick Results Visualization

Hanwen Jiang 47 Dec 09, 2022
Implementation of Memory-Efficient Neural Networks with Multi-Level Generation, ICCV 2021

Memory-Efficient Multi-Level In-Situ Generation (MLG) By Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray T. Chen and David Z. Pan

Jiaqi Gu 2 Jan 04, 2022
A font family with a great monospaced variant for programmers.

Fantasque Sans Mono A programming font, designed with functionality in mind, and with some wibbly-wobbly handwriting-like fuzziness that makes it unas

Jany Belluz 6.3k Jan 08, 2023
PyTorch implementations of neural network models for keyword spotting

Honk: CNNs for Keyword Spotting Honk is a PyTorch reimplementation of Google's TensorFlow convolutional neural networks for keyword spotting, which ac

Castorini 475 Dec 15, 2022
Winning solution of the Indoor Location & Navigation Kaggle competition

This repository contains the code to generate the winning solution of the Kaggle competition on indoor location and navigation organized by Microsoft

Tom Van de Wiele 62 Dec 28, 2022
The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

Deep High-Resolution Representation Learning for Human Pose Estimation (CVPR 2019) News [2020/07/05] A very nice blog from Towards Data Science introd

Leo Xiao 3.9k Jan 05, 2023
A small tool to joint picture including gif

README 做设计的时候遇到拼接长图的情况,但是发现没有什么好用的能拼接gif的工具。 于是自己写了个gif拼接小工具。 可以自动拼接gif、png和jpg等常见格式。 效果 从上至下 从下至上 从左至右 从右至左 使用 克隆仓库 git clone https://github.com/Dels

3 Dec 15, 2021
Dynamic Head: Unifying Object Detection Heads with Attentions

Dynamic Head: Unifying Object Detection Heads with Attentions dyhead_video.mp4 This is the official implementation of CVPR 2021 paper "Dynamic Head: U

Microsoft 550 Dec 21, 2022
Official implementation for “Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior”

Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior. The code will release soon. Implementation Python3 PyTorch=1.0 NVIDIA GPU+

FengZhang 34 Dec 04, 2022
High-performance moving least squares material point method (MLS-MPM) solver.

High-Performance MLS-MPM Solver with Cutting and Coupling (CPIC) (MIT License) A Moving Least Squares Material Point Method with Displacement Disconti

Yuanming Hu 2.2k Dec 31, 2022
AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages

AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages This repository contains the code for the pa

Kelechi 40 Nov 24, 2022
Evolutionary Scale Modeling (esm): Pretrained language models for proteins

Evolutionary Scale Modeling This repository contains code and pre-trained weights for Transformer protein language models from Facebook AI Research, i

Meta Research 1.6k Jan 09, 2023
Collection of sports betting AI tools.

sports-betting sports-betting is a collection of tools that makes it easy to create machine learning models for sports betting and evaluate their perf

George Douzas 109 Dec 31, 2022
A program to recognize fruits on pictures or videos using yolov5

Yolov5 Fruits Detector Requirements Either Linux or Windows. We recommend Linux for better performance. Python 3.6+ and PyTorch 1.7+. Installation To

Fateme Zamanian 30 Jan 06, 2023
details on efforts to dump the Watermelon Games Paprium cart

Reminder, if you like these repos, fork them so they don't disappear https://github.com/ArcadeHustle/WatermelonPapriumDump/fork Big thanks to Fonzie f

Hustle Arcade 29 Dec 11, 2022