The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

Last update: Dec 28, 2022

Related tags

Deep Learning PoseFormer

Overview

3D Human Pose Estimation with Spatial and Temporal Transformers

This repo is the official implementation for 3D Human Pose Estimation with Spatial and Temporal Transformers.

Video Demonstration

PoseFormer Architecture

Video Demo


3D HPE on Human3.6M


3D HPE on videos in-the-wild using PoseFormer

Our code is built on top of VideoPose3D.

Environment

The code is developed and tested under the following environment

Python 3.8.2
PyTorch 1.7.1
CUDA 11.0

You can create the environment:

conda env create -f poseformer.yml

Dataset

Our code is compatible with the dataset setup introduced by Martinez et al. and Pavllo et al.. Please refer to VideoPose3D to set up the Human3.6M dataset (./data directory).

Evaluating pre-trained models

We provide the pre-trained 81-frame model (CPN detected 2D pose as input) here. To evaluate it, put it into the ./checkpoint directory and run:

python run_poseformer.py -k cpn_ft_h36m_dbb -f 81 -c checkpoint --evaluate detected81f.bin

We also provide pre-trained 81-frame model (Ground truth 2D pose as input) here. To evaluate it, put it into the ./checkpoint directory and run:

python run_poseformer.py -k gt -f 81 -c checkpoint --evaluate gt81f.bin

Training new models

To train a model from scratch (CPN detected 2D pose as input), run:

python run_poseformer.py -k cpn_ft_h36m_dbb -f 27 -lr 0.00004 -lrd 0.99

-f controls how many frames are used as input. 27 frames achieves 47.0 mm, 81 frames achieves achieves 44.3 mm.

To train a model from scratch (Ground truth 2D pose as input), run:

python run_poseformer.py -k gt -f 81 -lr 0.0004 -lrd 0.99

81 frames achieves 31.3 mm (MPJPE).

Visualization and other functions

We keep our code consistent with VideoPose3D. Please refer to their project page for further information.

Bibtex

If you find our work useful in your research, please consider citing:

@article{zheng20213d,
title={3D Human Pose Estimation with Spatial and Temporal Transformers},
author={Zheng, Ce and Zhu, Sijie and Mendieta, Matias and Yang, Taojiannan and Chen, Chen and Ding, Zhengming},
journal={arXiv preprint arXiv:2103.10455},
year={2021}
}

Acknowledgement

Part of our code is borrowed from VideoPose3D. We thank the authors for releasing the codes.

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

Related tags

Overview

3D Human Pose Estimation with Spatial and Temporal Transformers

PoseFormer Architecture

Video Demo

Environment

Dataset

Evaluating pre-trained models

Training new models

Visualization and other functions

Bibtex

Acknowledgement

Owner

Ce Zheng

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

A high-performance anchor-free YOLO. Exceeding yolov3~v5 with ONNX, TensorRT, NCNN, and Openvino supported.

Code in PyTorch for the convex combination linear IAF and the Householder Flow, J.M. Tomczak & M. Welling

This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Language Models Can See: Plugging Visual Controls in Text Generation

Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.

RE3: State Entropy Maximization with Random Encoders for Efficient Exploration

基于Pytorch实现优秀的自然图像分割框架！(包括FCN、U-Net和Deeplab)

Easy genetic ancestry predictions in Python

Dynamic Multi-scale Filters for Semantic Segmentation (DMNet ICCV'2019)

Code for CVPR2021 paper "Robust Reflection Removal with Reflection-free Flash-only Cues"

🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~

Application of K-means algorithm on a music dataset after a dimensionality reduction with PCA

Reinforcement learning library in JAX.

Generalized Jensen-Shannon Divergence Loss for Learning with Noisy Labels

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

Prometheus Exporter for data scraped from datenplattform.darmstadt.de

Task Transformer Network for Joint MRI Reconstruction and Super-Resolution (MICCAI 2021)

A python script to dump all the challenges locally of a CTFd-based Capture the Flag.