Code repository for the paper "Tracking People with 3D Representations"

Last update: Dec 03, 2022

Related tags

Overview

Tracking People with 3D Representations

Code repository for the paper "Tracking People with 3D Representations" (paper link) (project site).
Jathushan Rajasegaran, Georgios Pavlakos, Angjoo Kanazawa, Jitendra Malik.
Neural Information Processing Systems (NeurIPS), 2021.

This code repository provides a code implementation for our paper T3DP, with installation, preparing datasets, and evaluating on datasets, and a demo code to run on any youtube videos.

Abstract : We present a novel approach for tracking multiple people in video. Unlike past approaches which employ 2D representations, we focus on using 3D representations of people, located in three-dimensional space. To this end, we develop a method, Human Mesh and Appearance Recovery (HMAR) which in addition to extracting the 3D geometry of the person as a SMPL mesh, also extracts appearance as a texture map on the triangles of the mesh. This serves as a 3D representation for appearance that is robust to viewpoint and pose changes. Given a video clip, we first detect bounding boxes corresponding to people, and for each one, we extract 3D appearance, pose, and location information using HMAR. These embedding vectors are then sent to a transformer, which performs spatio-temporal aggregation of the representations over the duration of the sequence. The similarity of the resulting representations is used to solve for associations that assigns each person to a tracklet. We evaluate our approach on the Posetrack, MuPoTs and AVA datasets. We find that 3D representations are more effective than 2D representations for tracking in these settings, and we obtain state-of-the-art performance.

Installation

We recommend creating a clean conda environment and install all dependencies. You can do this as follows:

conda env create -f _environment.yml

After the installation is complete you can activate the conda environment by running:

conda activate T3DP

Install PyOpenGL from this repository:

pip uninstall pyopengl
git clone https://github.com/mmatl/pyopengl.git
pip install ./pyopengl

Additionally, install Detectron2 from the official repository, if you need to run demo code on a local machine. We provide detections inside the _DATA folder, so for running the tracker on posetrack or mupots, you do not need to install Detectron2.

Download Data

We provide preprocessed files for PoseTrack and MuPoTs datasets (AVA files will be released soon!). Please download this folder and extract inside the main repository.

Training

To train the transformer model with posetrack data run,

python train_t3dp.py
--learning_rate 0.001
--lr_decay_epochs 10000,20000
--epochs 100000
--tags T3PO
--train_dataset posetrack_2018
--test_dataset posetrack_2018
--train_batch_size 32
--feature APK
--train

WANDB will create unique names for each run, and save the model names accordingly. Use this name for evaluation. We have also provided pretrained weights inside the _DATA folder.

Testing

Once the posetrack dataset is downloaded at "_DATA/Posetrack_2018/", run the following command to run our tracker on all validation videos.

python test_t3dp.py
--dataset "posetrack"
--dataset_path "_DATA/Posetrack_2018/"
--storage_folder "Videos_Final"
--render True
--save True

Evaluation

To evaluate the tracking performance on ID switches, MOTA, and IDF1 metrics, please run the following command.

python3 evaluate_t3dp.py out/Videos_Final/results/ t3dp posetrack

Demo

Please run the following command to run our method on a youtube video. This will download the youtube video from a given ID, and extract frames, run Detectron2, run HMAR and finally run our tracker and renders the video.

python3 demo.py

Results (Project site)

We evaluated our method on PoseTrack, MuPoTs and AVA datasets. Our results show significant improvements over the state-of-the-art methods on person tracking. For more results please visit our website.

Acknowledgements

Parts of the code are taken or adapted from the following repos:

Contact

Jathushan Rajasegaran - [email protected] or [email protected]
To ask questions or report issues, please open an issue on the issues tracker.
Discussions, suggestions and questions are welcome!

Citation

If you find this code useful for your research or the use data generated by our method, please consider citing the following paper:

@Inproceedings{rajasegaran2021tracking,
  title     = {Tracking People with 3D Representations},
  author    = {Rajasegaran, Jathushan and Pavlakos, Georgios and Kanazawa, Angjoo and Malik, Jitendra},
  Booktitle = {NeurIPS},
  year      = {2021}
}

Code repository for the paper "Tracking People with 3D Representations"

Related tags

Overview

Tracking People with 3D Representations

Installation

Download Data

Training

Testing

Evaluation

Demo

Results (Project site)

Acknowledgements

Contact

Citation

Owner

Jathushan Rajasegaran

[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination

Investigating automatic navigation towards standard US views integrating MARL with the virtual US environment developed in CT2US simulation

Graph WaveNet apdapted for brain connectivity analysis.

Efficient Speech Processing Tookit for Automatic Speaker Recognition

Graph Convolutional Networks in PyTorch

Goal of the project : Detecting Temporal Boundaries in Sign Language videos

Deep ViT Features as Dense Visual Descriptors

[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

ROS Basics and TurtleSim

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation

Large scale PTM - PPI relation extraction

This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (ICLR 2022)

Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree

A modular, research-friendly framework for high-performance and inference of sequence models at many scales

Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.

Head2Toe: Utilizing Intermediate Representations for Better OOD Generalization

Single-Stage 6D Object Pose Estimation, CVPR 2020

Pytorch code for "DPFM: Deep Partial Functional Maps" - 3DV 2021 (Oral)

Management Dashboard for Torchserve

The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"