HHP-Net: A light Heteroscedastic neural network for Head Pose estimation with uncertainty

Related tags

Deep LearningHHP-Net
Overview

HHP-Net: A light Heteroscedastic neural network for Head Pose estimation with uncertainty

Giorgio Cantarini, Francesca Odone, Nicoletta Noceti, Federico Tomenotti - WACV 2022

Abstract: In this paper we introduce a novel method to estimate the head pose of people in single images starting from a small set of head keypoints. To this purpose, we propose a regression model that exploits keypoints and outputs the head pose represented by yaw, pitch, and roll. Our model is simple to implement and more efficient with respect to the state of the art -- faster in inference and smaller in terms of memory occupancy -- with comparable accuracy. Our method also provides a measure of the heteroscedastic uncertainties associated with the three angles, through an appropriately designed loss function. As an example application, we address social interaction analysis in images: we propose an algorithm for a quantitative estimation of the level of interaction between people, starting from their head poses and reasoning on their mutual positions. ArXiv

Any questions or discussions are welcomed!

Installation

To download the repository:

git clone https://github.com/cantarinigiorgio/HHP-Net

To install the requirements:

pip install -r requirements.txt

Network architecture

Demo

There are different choices for the key points detector: in this repository we propose two variants

  • a normal version, very precise but less efficient
  • a faster version less accurate but faster

Normal version

We test three different backbones of CenterNet (HourGlass104, Resnet50V2 and Resnet50V1 available in the TensorFlow 2 Detection Model Zoo); each model takes as input 512x512 images.

Download one of the previous model (e.g. HourGlass104) then extract it to HHP-Net/centernet/ with:

tar -zxvf centernet_hg104_512x512_kpts_coco17_tpu-32.tar.gz -C /HHP-Net/centernet

To make inference on a single image, run:

python inference_on_image.py [--detection-model PATH_DETECTION_MODEL] [--hhp-model PATH_HHPNET] [--image PATH_IMAGE]  

To make inference on frames from the webcam, run:

python inference_on_webcam.py [--detection-model PATH_DETECTION_MODEL] [--hhp-model PATH_HHPNET] 

Faster version

To estimate the keypoints firstly we use an object detection model for detecting people; then we exploit a model to estimate the pose of each people detected by the previous model in the image.

In order to detect people we test Centernet MobilenetV2: download it and then extract it to HHP-Net/centernet/:

tar -zxvf centernet_mobilenetv2fpn_512x512_coco17_od.tar.gz -C /HHP-Net/centernet

Then download Posenet for pose estimation and move to HHP-Net/posenet/

mv posenet_mobilenet_v1_100_257x257_multi_kpt_stripped.tflite HHP-Net/posenet/

To make inference on a single image, run:

python fast_inference_on_image.py [--detection-model PATH_MODEL_DETECTION] [--pose-model PATH_MODEL_POSE] [--hhp-model PATH_HHPNET] [--image PATH_IMAGE] 

To make inference on frames from the webcam, run:

python fast_inference_on_webcam.py [--detection-model PATH_MODEL_DETECTION] [--pose-model PATH_MODEL_POSE] [--hhp-model PATH_HHPNET] 

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@misc{cantarini2021hhpnet,
      title={HHP-Net: A light Heteroscedastic neural network for Head Pose estimation with uncertainty}, 
      author={Giorgio Cantarini and Federico Figari Tomenotti and Nicoletta Noceti and Francesca Odone},
      year={2021},
      eprint={2111.01440},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Code Author

  • Giorgio Cantarini - Imavis s.r.l. and Malga (Machine Learning Genoa Center)
Owner
Computer Vision Engineer at Imavis s.r.l.
Photo2cartoon - 人像卡通化探索项目 (photo-to-cartoon translation project)

人像卡通化 (Photo to Cartoon) 中文版 | English Version 该项目为小视科技卡通肖像探索项目。您可使用微信扫描下方二维码或搜索“AI卡通秀”小程序体验卡通化效果。

Minivision_AI 3.5k Dec 30, 2022
MIMO-UNet - Official Pytorch Implementation

MIMO-UNet - Official Pytorch Implementation This repository provides the official PyTorch implementation of the following paper: Rethinking Coarse-to-

Sungjin Cho 248 Jan 02, 2023
Random-Afg - Afghanistan Random Old Idz Cloner Tools

AFGHANISTAN RANDOM OLD IDZ CLONER TOOLS Install $ apt update $ apt upgrade $ apt

MAHADI HASAN AFRIDI 5 Jan 26, 2022
"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

SOLQ: Segmenting Objects by Learning Queries This repository is an official implementation of the paper SOLQ: Segmenting Objects by Learning Queries.

MEGVII Research 179 Jan 02, 2023
Codes for the compilation and visualization examples to the HIF vegetation dataset

High-impedance vegetation fault dataset This repository contains the codes that compile the "Vegetation Conduction Ignition Test Report" data, which a

1 Dec 12, 2021
Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (MTCNN)

Face-Detection-with-MTCNN Face detection is a computer vision problem that involves finding faces in photos. It is a trivial problem for humans to sol

Chetan Hirapara 3 Oct 07, 2022
Facial recognition project

Facial recognition project documentation Project introduction This project is developed by linuxu. It is a face model recognition project developed ba

Jefferson 2 Dec 04, 2022
Using modified BiSeNet for face parsing in PyTorch

face-parsing.PyTorch Contents Training Demo References Training Prepare training data: -- download CelebAMask-HQ dataset -- change file path in the pr

zll 1.6k Jan 08, 2023
Episodic-memory - Ego4D Episodic Memory Benchmark

Ego4D Episodic Memory Benchmark EGO4D is the world's largest egocentric (first p

3 Feb 18, 2022
Multi-angle c(q)uestion answering

Macaw Introduction Macaw (Multi-angle c(q)uestion answering) is a ready-to-use model capable of general question answering, showing robustness outside

AI2 430 Jan 04, 2023
History Aware Multimodal Transformer for Vision-and-Language Navigation

History Aware Multimodal Transformer for Vision-and-Language Navigation This repository is the official implementation of History Aware Multimodal Tra

Shizhe Chen 46 Nov 23, 2022
Official Implementation (PyTorch) of "Point Cloud Augmentation with Weighted Local Transformations", ICCV 2021

PointWOLF: Point Cloud Augmentation with Weighted Local Transformations This repository is the implementation of PointWOLF(To appear). Sihyeon Kim1*,

MLV Lab (Machine Learning and Vision Lab at Korea University) 16 Nov 03, 2022
deep learning model with only python and numpy with test accuracy 99 % on mnist dataset and different optimization choices

deep_nn_model_with_only_python_100%_test_accuracy deep learning model with only python and numpy with test accuracy 99 % on mnist dataset and differen

0 Aug 28, 2022
PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

halo 368 Dec 06, 2022
This repository contains a CBIR system that uses swin transformer to extract image's feature.

Swin-transformer based CBIR This repository contains a CBIR(content-based image retrieval) system. Here we use Swin-transformer to extract query image

JsHou 12 Nov 17, 2022
Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)

Python Streaming Anomaly Detection (PySAD) PySAD is an open-source python framework for anomaly detection on streaming multivariate data. Documentatio

Selim Firat Yilmaz 181 Dec 18, 2022
Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022)

Blockwise Sequential Model Learning Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022) For ins

2 Jun 17, 2022
StocksMA is a package to facilitate access to financial and economic data of Moroccan stocks.

Creating easier access to the Moroccan stock market data What is StocksMA ? StocksMA is a package to facilitate access to financial and economic data

Salah Eddine LABIAD 28 Jan 04, 2023
Implementation for the "Surface Reconstruction from 3D Line Segments" paper.

Surface Reconstruction from 3D Line Segments Surface reconstruction from 3d line segments. Langlois, P. A., Boulch, A., & Marlet, R. In 2019 Internati

85 Jan 04, 2023
Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

TGraM Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling, Qibin He, Xian Sun, Zhiyuan Yan, Beibei Li, Kun Fu Abstract Rece

Qibin He 6 Nov 25, 2022