Basics of 2D and 3D Human Pose Estimation.

Last update: Dec 14, 2022

Related tags

Overview

Human Pose Estimation 101

If you want a slightly more rigorous tutorial and understand the basics of Human Pose Estimation and how the field has evolved, check out these articles I published on 2D Pose Estimation and 3D Pose Estimation

Basics
Loss
Evaluation metrics
- PCP
- PCK
- PDJ
- MPJPE
- AUC
Important applications
Extra

Basics

Defined as the problem of localization of human joints (or) keypoints
A rigid body consists of joints and rigid parts. A body with strong articulation is a body with strong contortion.
Pose Estimation is the search for a specific pose in space of all articulated poses
Number of keypoints varies with dataset - LSP has 14, MPII has 16, 16 are used in Human3.6m
Classifed into 2D and 3D Pose Estimation
- 2D Pose Estimation
- Estimate a 2D pose (x,y) coordinates for each joint in pixel space from a RGB image
- 3D Pose Estimation
- Estimate a 3D pose (x,y,z) coordinates in metric space from a RGB image, or in previous works, data from a RGB-D sensor. (However, research in the past few years is heavily focussed on generating 3D poses from 2D images / 2D videos)

Loss

Most commonly used loss function - Mean Squared Error, MSE(Least Squares Loss)
This is a regression problem. The model will try to regress to the the correct coordinates, i.e move to the ground truth coordinatate’s in small increments. The model is trained to output continuous coordinates using a Mean Squared Error loss function

Evaluation metrics

Percentage of Correct Parts - PCP

A limb is considered detected and a correct part if the distance between the two predicted joint locations and the true limb joint locations is at most half of the limb length (PCP at 0.5 )
Measures detection rate of limbs
Cons - penalizes shorter limbs
Calculation
- For a specific part, PCP = (No. of correct parts for entire dataset) / (No. of total parts for entire dataset)
- Take a dataset with 10 images and 1 pose per image. Each pose has 8 parts - ( upper arm, lower arm, upper leg, lower leg ) x2
- No of upper arms = 10 * 2 = 20
- No of lower arms = 20
- No of lower legs = No of upper legs = 20
- If upper arm is detected correct for 17 out of the 20 upper arms i.e 17 ( 10 right arms and 7 left) → PCP = 17/20 = 85%
Higher the better

Percentage of Correct Key-points - PCK

Detected joint is considered correct if the distance between the predicted and the true joint is within a certain threshold (threshold varies)
[email protected] is when the threshold = 50% of the head bone link
[email protected] == Distance between predicted and true joint < 0.2 * torso diameter
Sometimes 150 mm is taken as the threshold
Head, shoulder, Elbow, Wrist, Hip, Knee, Ankle → Keypoints
PCK is used for 2D and 3D (PCK3D)
Higher the better

Percentage of Detected Joints - PDJ

Detected joint is considered correct if the distance between the predicted and the true joint is within a certain fraction of the torso diameter
Alleviates the shorter limb problem since shorter limbs have smaller torsos
PDJ at 0.2 → Distance between predicted and true join < 0.2 * torso diameter
Typically used for 2D Pose Estimation
Higher the better

Mean Per Joint Position Error - MPJPE

Per joint position error = Euclidean distance between ground truth and prediction for a joint
Mean per joint position error = Mean of per joint position error for all k joints (Typically, k = 16)
Calculated after aligning the root joints (typically the pelvis) of the estimated and groundtruth 3D pose.
PA MPJPE
- Procrustes analysis MPJPE.
- MPJPE calculated after the estimated 3D pose is aligned to the groundtruth by the Procrustes method
- Procrustes method is simply a similarity transformation
Lower the better
Used for 3D Pose Estimation

AUC

Important Applications

Activity Analysis
Human-Computer Interaction (HCI)
Virtual Reality
Augmented Reality
Amazon Go presents an important domain for the application of Human Pose Estimation. Cameras track and recognize people and their actions, for which Pose Estimation is an important component. Entities relying on services that track and measure human activities rely heavily on human Pose Estimation

Informative roadmap on 2D Human Pose Estimation research

Presentation by Wei Yang

Basics of 2D and 3D Human Pose Estimation.

Related tags

Overview

Human Pose Estimation 101

Table of Contents

Basics

Loss

Evaluation metrics

Percentage of Correct Parts - PCP

Percentage of Correct Key-points - PCK

Percentage of Detected Joints - PDJ

Mean Per Joint Position Error - MPJPE

AUC

Important Applications

Informative roadmap on 2D Human Pose Estimation research

Owner

Sudharshan Chandra Babu

CondLaneNet: a Top-to-down Lane Detection Framework Based on Conditional Convolution

Multi-Stage Spatial-Temporal Convolutional Neural Network (MS-GCN)

This is official implementaion of paper "Token Shift Transformer for Video Classification".

Rule based classification A hotel s customers dataset

Clustering is a popular approach to detect patterns in unlabeled data

Multi-task yolov5 with detection and segmentation based on yolov5

PyTorch implementation of paper "StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement" (ICCV 2021 Oral)

Air Pollution Prediction System using Linear Regression and ANN

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

MoCoPnet - Deformable 3D Convolution for Video Super-Resolution

A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

This repository contains the code for: RerrFact model for SciVer shared task

一个免费开源一键搭建的通用验证码识别平台，大部分常见的中英数验证码识别都没啥问题。

On Evaluation Metrics for Graph Generative Models

Implementation of algorithms for continuous control (DDPG and NAF).

Project looking into use of autoencoder for semi-supervised learning and comparing data requirements compared to supervised learning.

Keqing Chatbot With Python