The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

Last update: Dec 28, 2022

Related tags

Computer Vision PoseFormer

Overview

3D Human Pose Estimation with Spatial and Temporal Transformers

This repo is the official implementation for 3D Human Pose Estimation with Spatial and Temporal Transformers.

Video Demonstration

PoseFormer Architecture

Video Demo


3D HPE on Human3.6M


3D HPE on videos in-the-wild using PoseFormer

Our code is built on top of VideoPose3D.

Environment

The code is developed and tested under the following environment

Python 3.8.2
PyTorch 1.7.1
CUDA 11.0

You can create the environment:

conda env create -f poseformer.yml

Dataset

Our code is compatible with the dataset setup introduced by Martinez et al. and Pavllo et al.. Please refer to VideoPose3D to set up the Human3.6M dataset (./data directory).

Evaluating pre-trained models

We provide the pre-trained 81-frame model (CPN detected 2D pose as input) here. To evaluate it, put it into the ./checkpoint directory and run:

python run_poseformer.py -k cpn_ft_h36m_dbb -f 81 -c checkpoint --evaluate detected81f.bin

We also provide pre-trained 81-frame model (Ground truth 2D pose as input) here. To evaluate it, put it into the ./checkpoint directory and run:

python run_poseformer.py -k gt -f 81 -c checkpoint --evaluate gt81f.bin

Training new models

To train a model from scratch (CPN detected 2D pose as input), run:

python run_poseformer.py -k cpn_ft_h36m_dbb -f 27 -lr 0.0001 -lrd 0.99

-f controls how many frames are used as input. 27 frames achieves 47.0 mm, 81 frames achieves achieves 44.3 mm.

To train a model from scratch (Ground truth 2D pose as input), run:

python run_poseformer.py -k gt -f 81 -lr 0.0001 -lrd 0.99

81 frames achieves 31.3 mm (MPJPE).

Visualization and other functions

We keep our code consistent with VideoPose3D. Please refer to their project page for further information.

Bibtex

If you find our work useful in your research, please consider citing:

@article{zheng20213d,
title={3D Human Pose Estimation with Spatial and Temporal Transformers},
author={Zheng, Ce and Zhu, Sijie and Mendieta, Matias and Yang, Taojiannan and Chen, Chen and Ding, Zhengming},
journal={arXiv preprint arXiv:2103.10455},
year={2021}
}

Acknowledgement

Part of our code is borrowed from VideoPose3D. We thank the authors for releasing the codes.

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

Related tags

Overview

3D Human Pose Estimation with Spatial and Temporal Transformers

PoseFormer Architecture

Video Demo

Environment

Dataset

Evaluating pre-trained models

Training new models

Visualization and other functions

Bibtex

Acknowledgement

Owner

Ce Zheng

Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Code for the paper: Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution

The official code for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

原神风花节自动弹琴辅助

Maze generator and solver with python

([email protected]) Boosting Co-teaching with Compression Regularization for Label Noise

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

This is a project to detect gestures to zoom in or out, using the real-time distance between the index finger and the thumb. It's based on OpenCV and Mediapipe.

A post-processing tool for scanned sheets of paper.

Zoom , GoogleMeets에서 Vtuber 데뷔하기

Awesome multilingual OCR toolkits based on PaddlePaddle （practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices）

Table Extraction Tool

This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vectors.

7th place solution

This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Python-based tools for document analysis and OCR

A novel region proposal network for more general object detection ( including scene text detection ).

Python-based tools for document analysis and OCR

This is used to convert a string to an Image with Handwritten Characters.