This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

Overview

DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation

This repo is the official implementation of "DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation". [Paper] [Project]

Update

  • Clean version is released! It currently includes code, data, log and models for the following tasks:
  • 2D human pose estimation
  • 3D human pose estimation
  • Body recovery via a SMPL model

TODO

  • Provide different sample interval checkpoints/logs
  • Add DeciWatch in MMHuman3D

Description

This paper proposes a simple baseline framework for video-based 2D/3D human pose estimation that can achieve 10 times efficiency improvement over existing works without any performance degradation, named DeciWatch. Unlike current solutions that estimate each frame in a video, DeciWatch introduces a simple yet effective sample-denoise-recover framework that only watches sparsely sampled frames, taking advantage of the continuity of human motions and the lightweight pose representation. Specifically, DeciWatch uniformly samples less than 10% video frames for detailed estimation, denoises the estimated 2D/3D poses with an efficient Transformer architecture, and then accurately recovers the rest of the frames using another Transformer-based network. Comprehensive experimental results on three video-based human pose estimation, body mesh recovery tasks and efficient labeling in videos with four datasets validate the efficiency and effectiveness of DeciWatch.

Getting Started

Environment Requirement

DeciWatch has been implemented and tested on Pytorch 1.10.1 with python >= 3.6. It supports both GPU and CPU inference.

Clone the repo:

git clone https://github.com/cure-lab/DeciWatch.git

We recommend you install the requirements using conda:

# conda
source scripts/install_conda.sh

Prepare Data

All the data used in our experiment can be downloaded here.

Google Drive

Baidu Netdisk

Valid data includes:

Dataset Pose Estimator 3D Pose 2D Pose SMPL
Sub-JHMDB SimplePose
3DPW EFT
3DPW PARE
3DPW SPIN
Human3.6M FCN
AIST++ SPIN

Please refer to doc/data.md for detailed data information and data preparing.

Training

Run the commands below to start training:

python train.py --cfg [config file] --dataset_name [dataset name] --estimator [backbone estimator you use] --body_representation [smpl/3D/2D] --sample_interval [sample interval N]

For example, you can train on 3D representation of 3DPW using backbone estimator SPIN with sample interval 10 by:

python train.py --cfg configs/config_pw3d_spin.yaml --dataset_name pw3d --estimator spin --body_representation 3D --sample_interval 10

Note that the training and testing datasets should be downloaded and prepared before training.

You may refer to doc/training.md for more training details.

Evaluation

Results on 2D Pose

Dataset Estimator PCK 0.05 (INPUT/OUTPUT) PCK 0.1 (INPUT/OUTPUT) PCK 0.2 (INPUT/OUTPUT) Download
Sub-JHMDB simplepose 57.30%/79.32% 81.61%/94.27% 93.94%/98.85% Baidu Netdisk / Google Drive

Results on 3D Pose

Dataset Estimator MPJPE (INPUT/OUTPUT) Accel (INPUT/OUTPUT) Download
3DPW SPIN 96.92/93.34 34.68/7.06 Baidu Netdisk / Google Drive
3DPW EFT 90.34/89.02 32.83/6.84 Baidu Netdisk / Google Drive
3DPW PARE 78.98/77.16 25.75/6.90 Baidu Netdisk / Google Drive
AIST++ SPIN 107.26/71.27 33.37/5.68 Baidu Netdisk / Google Drive
Human3.6M FCN 54.56/52.83 19.18/1.47 Baidu Netdisk / Google Drive

Results on SMPL

Dataset Estimator MPJPE (INPUT/OUTPUT) Accel (INPUT/OUTPUT) MPVPE (INPUT/OUTPUT) Download
3DPW SPIN 100.13/97.53 35.53/8.38 114.39/112.84 Baidu Netdisk / Google Drive
3DPW EFT 91.60/92.56 33.57/8.7 5 110.34/109.27 Baidu Netdisk / Google Drive
3DPW PARE 80.44/81.76 26.77/7.24 94.88/95.68 Baidu Netdisk / Google Drive
AIST++ SPIN 108.25/82.10 33.83/7.27 137.51/106.08 Baidu Netdisk / Google Drive

Noted that although our main contribution is the efficiency improvement, using DeciWatch as post processing is also helpful for accuracy and smoothness improvement.

You may refer to doc/evaluate.md for evaluate details.

Quick Demo

Run the commands below to visualize demo:

python demo.py --cfg [config file] --dataset_name [dataset name] --estimator [backbone estimator you use] --body_representation [smpl/3D/2D] --sample_interval [sample interval N]

You are supposed to put corresponding images with the data structure:

|-- data
    |-- videos
        |-- pw3d 
            |-- downtown_enterShop_00
                |-- image_00000.jpg
                |-- ...
            |-- ...
        |-- jhmdb
            |-- catch
            |-- ...
        |-- aist
            |-- gWA_sFM_c01_d27_mWA2_ch21.mp4
            |-- ...
        |-- ...

For example, you can train on 3D representation of 3DPW using backbone estimator SPIN with sample interval 10 by:

python demo.py --cfg configs/config_pw3d_spin.yaml --dataset_name pw3d --estimator spin --body_representation 3D --sample_interval 10

Please refer to the dataset website for the raw images. You may change the config in lib/core/config.py for different visualization parameters.

You may refer to doc/visualize.md for visualization details.

Citing DeciWatch

If you find this repository useful for your work, please consider citing it as follows:

@article{zeng2022deciwatch,
  title={DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation},
  author={Zeng, Ailing and Ju, Xuan and Yang, Lei and Gao, Ruiyuan and Zhu, Xizhou and Dai, Bo and Xu, Qiang},
  journal={arXiv preprint arXiv:2203.08713},
  year={2022}
}

Please remember to cite all the datasets and backbone estimators if you use them in your experiments.

Acknowledgement

Many thanks to Xuan Ju for her great efforts to clean almost the original code!!!

License

This code is available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using this code you agree to the terms in the LICENSE. Third-party datasets and software are subject to their respective licenses.

XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale

XtremeDistilTransformers for Distilling Massive Multilingual Neural Networks ACL 2020 Microsoft Research [Paper] [Video] Releasing [XtremeDistilTransf

Microsoft 125 Jan 04, 2023
A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

Jun-Yan Zhu 27 Aug 08, 2022
NeRD: Neural Reflectance Decomposition from Image Collections

NeRD: Neural Reflectance Decomposition from Image Collections Project Page | Video | Paper | Dataset Implementation for NeRD. A novel method which dec

Computergraphics (University of Tübingen) 195 Dec 29, 2022
Train emoji embeddings based on emoji descriptions.

emoji2vec This is my attempt to train, visualize and evaluate emoji embeddings as presented by Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko

Miruna Pislar 17 Sep 03, 2022
Implementation of the Remixer Block from the Remixer paper, in Pytorch

Remixer - Pytorch Implementation of the Remixer Block from the Remixer paper, in Pytorch. It claims that substituting the feedforwards in transformers

Phil Wang 35 Aug 23, 2022
An efficient PyTorch library for Global Wheat Detection using YOLOv5. The project is based on this Kaggle competition Global Wheat Detection (2021).

Global-Wheat-Detection An efficient PyTorch library for Global Wheat Detection using YOLOv5. The project is based on this Kaggle competition Global Wh

Chuxin Wang 11 Sep 25, 2022
My usage of Real-ESRGAN to upscale anime, some test and results in the test_img folder

anime upscaler My usage of Real-ESRGAN to upscale anime, I hope to use this on a proper GPU cuz doing this on CPU is completely shit 😂 , I even tried

Shangar Muhunthan 29 Jan 07, 2023
This is a collection of our NAS and Vision Transformer work.

This is a collection of our NAS and Vision Transformer work.

Microsoft 828 Dec 28, 2022
A smart Chat bot that can help to know about corona virus and Make prediction of corona using X-ray.

TRINIT_Hum_kuchh_nahi_karenge_ML01 Document Link https://github.com/Jatin-Goyal-552/TRINIT_Hum_kuchh_nahi_karenge_ML01/blob/main/hum_kuchh_nahi_kareng

JatinGoyal 1 Feb 03, 2022
Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021, official Pytorch implementatio

Microsoft 247 Dec 25, 2022
D-NeRF: Neural Radiance Fields for Dynamic Scenes

D-NeRF: Neural Radiance Fields for Dynamic Scenes [Project] [Paper] D-NeRF is a method for synthesizing novel views, at an arbitrary point in time, of

Albert Pumarola 291 Jan 02, 2023
Deep Compression for Dense Point Cloud Maps.

DEPOCO This repository implements the algorithms described in our paper Deep Compression for Dense Point Cloud Maps. How to get started (using Docker)

Photogrammetry & Robotics Bonn 67 Dec 06, 2022
Pytorch reimplementation of PSM-Net: "Pyramid Stereo Matching Network"

This is a Pytorch Lightning version PSMNet which is based on JiaRenChang/PSMNet. use python main.py to start training. PSM-Net Pytorch reimplementatio

XIAOTIAN LIU 1 Nov 25, 2021
Codes to calculate solar-sensor zenith and azimuth angles directly from hyperspectral images collected by UAV. Works only for UAVs that have high resolution GNSS/IMU unit.

UAV Solar-Sensor Angle Calculation Table of Contents About The Project Built With Getting Started Prerequisites Installation Datasets Contributing Lic

Sourav Bhadra 1 Jan 15, 2022
GDSC-ML Team Interview Task

GDSC-ML-Team---Interview-Task Task 1 : Clean or Messy room In this task we have to classify the given test images as clean or messy. - Link for datase

Aayush. 1 Jan 19, 2022
PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement.

DECOR-GAN PyTorch 1.5 implementation for paper DECOR-GAN: 3D Shape Detailization by Conditional Refinement, Zhiqin Chen, Vladimir G. Kim, Matthew Fish

Zhiqin Chen 72 Dec 31, 2022
A collection of educational notebooks on multi-view geometry and computer vision.

Multiview notebooks This is a collection of educational notebooks on multi-view geometry and computer vision. Subjects covered in these notebooks incl

Max 65 Dec 09, 2022
AI-based, context-driven network device ranking

Batea A batea is a large shallow pan of wood or iron traditionally used by gold prospectors for washing sand and gravel to recover gold nuggets. Batea

Secureworks Taegis VDR 269 Nov 26, 2022
A simple, fast, and efficient object detector without FPN

You Only Look One-level Feature (YOLOF), CVPR2021 A simple, fast, and efficient object detector without FPN. This repo provides an implementation for

789 Jan 09, 2023
Author's PyTorch implementation of TD3+BC, a simple variant of TD3 for offline RL

A Minimalist Approach to Offline Reinforcement Learning TD3+BC is a simple approach to offline RL where only two changes are made to TD3: (1) a weight

Scott Fujimoto 193 Dec 23, 2022