Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"

Last update: Dec 26, 2022

Related tags

Deep Learning MotionCLIP

Overview

MotionCLIP

Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space".

Please visit our webpage for more details.

Bibtex

If you find this code useful in your research, please cite:

@article{tevet2022motionclip,
title={MotionCLIP: Exposing Human Motion Generation to CLIP Space},
author={Tevet, Guy and Gordon, Brian and Hertz, Amir and Bermano, Amit H and Cohen-Or, Daniel},
journal={arXiv preprint arXiv:2203.08063},
year={2022}
}

Getting started

1. Create conda environment

conda env create -f environment.yml
conda activate motionclip

The code was tested on Python 3.8 and PyTorch 1.8.1.

2. Download data

Download and unzip the above datasets and place them correspondingly:

AMASS -> ./data/amass (Download the SMPL+H version for each dataset separately, please note to download ALL the dataset in AMASS website)
BABEL -> ./data/babel_v1.0_release
Rendered AMASS images -> ./data/render

3. Download the SMPL body model

bash prepare/download_smpl_files.sh

This will download the SMPL neutral model from this github repo and additionnal files.

In addition, download the Extended SMPL+H model (used in AMASS project) from MANO, and place it in ./models/smplh.

4. Parse data

Process the three datasets into a unified dataset with (text, image, motion) triplets.

To parse acording to the AMASS split (for all applications except action recognition), run:

python -m src.datasets.amass_parser --dataset_name amass

Only if you intend to use Action Recognition, run also:

python -m src.datasets.amass_parser --dataset_name babel

Using the pretrained model

First, download the model and place it at ./exps/paper-model

1. Text-to-Motion

To reproduce paper results, run:

 python -m src.visualize.text2motion ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_texts.txt

To run MotionCLIP with your own texts, create a text file, with each line depicts a different text input (see paper_texts.txt as a reference) and point to it with --input_file instead.

2. Vector Editing

To reproduce paper results, run:

 python -m src.visualize.motion_editing ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_edits.csv

To gain the input motions, we support two modes:

data - Retrieve motions from train/validation sets, according to their textual label. On it first run, src.visualize.motion_editing generates a file containing a list of all textual labels. You can look it up and choose motions for your own editing.
text - The inputs are free texts, instead of motions. We use CLIP text encoder to get CLIP representations, perform vector editing, then use MotionCLIP decoder to output the edited motion.

To run MotionCLIP on your own editing, create a csv file, with each line depicts a different edit (see paper_edits.csv as a reference) and point to it with --input_file instead.

3. Interpolation

To reproduce paper results, run:

 python -m src.visualize.motion_interpolation ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_interps.csv

To gain the input motions, we use the data mode described earlier.

To run MotionCLIP on your own interpolations, create a csv file, with each line depicts a different interpolation (see paper_interps.csv as a reference) and point to it with --input_file instead.

4. Action Recognition

For action recognition, we use a model trained on text class names. Download and place it at ./exps/classes-model.

python -m src.utils.action_classifier ./exps/classes-model/checkpoint_0200.pth.tar

Train your own

NOTE (11/MAY/22): The paper model is not perfectly reproduced using this code. We are working to resolve this issue. The trained model checkpoint we provide does reproduce results.

To reproduce paper-model run:

python -m src.train.train --clip_text_losses cosine --clip_image_losses cosine --pose_rep rot6d \
--lambda_vel 100 --lambda_rc 100 --lambda_rcxyz 100 \
--jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8 \
--lr 0.0001 --glob --translation --no-vertstrans --latent_dim 512 --num_epochs 500 --snapshot 10 \
--device <GPU DEVICE ID> \
--datapath ./data/amass_db/amass_30fps_db.pt \
--folder ./exps/my-paper-model

To reproduce classes-model run:

python -m src.train.train --clip_text_losses cosine --clip_image_losses cosine --pose_rep rot6d \
--lambda_vel 95 --lambda_rc 95 --lambda_rcxyz 95 \
--jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8 \
--lr 0.0001 --glob --translation --no-vertstrans --latent_dim 512 --num_epochs 500 --snapshot 10 \
--device <GPU DEVICE ID> \
--datapath ./data/amass_db/babel_30fps_db.pt \
--folder ./exps/my-classes-model

Acknowledgment

The code of the transformer model and the dataloader are based on ACTOR repository.

License

This code is distributed under an MIT LICENSE.

Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.

Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"

Related tags

Overview

MotionCLIP

Bibtex

Getting started

1. Create conda environment

2. Download data

3. Download the SMPL body model

4. Parse data

Using the pretrained model

1. Text-to-Motion

2. Vector Editing

3. Interpolation

4. Action Recognition

Train your own

Acknowledgment

License

Owner

Guy Tevet

Collection of Docker images for ML/DL and video processing projects

SegNet-like Autoencoders in TensorFlow

BRNet - code for Automated assessment of BI-RADS categories for ultrasound images using multi-scale neural networks with an order-constrained loss function

Unsupervised Image-to-Image Translation

Drslmarkov - Distributionally Robust Structure Learning for Discrete Pairwise Markov Networks

Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition (AGRA, ACM 2020, Oral)

Controlling Hill Climb Racing with Hand Tacking

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch

Bayesian optimization in PyTorch

Hitters Linear Regression - Hitters Linear Regression With Python

Research on Event Accumulator Settings for Event-Based SLAM

Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021

Red Team tool for exfiltrating files from a target's Google Drive that you have access to, via Google's API.

Genetic feature selection module for scikit-learn

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

The implementation of PEMP in paper "Prior-Enhanced Few-Shot Segmentation with Meta-Prototypes"

Code and data of the EMNLP 2021 paper "Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer"

Can we do Customers Segmentation using PHP and Unsupervized Machine Learning ? Yes we can ! 🤡

This script runs neural style transfer against the provided content image.

Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"

Related tags

Overview

MotionCLIP

Bibtex

Getting started

1. Create conda environment

2. Download data

3. Download the SMPL body model

4. Parse data

Using the pretrained model

1. Text-to-Motion

2. Vector Editing

3. Interpolation

4. Action Recognition

Train your own

Acknowledgment

License

Owner

Guy Tevet

Collection of Docker images for ML/DL and video processing projects

SegNet-like Autoencoders in TensorFlow

BRNet - code for Automated assessment of BI-RADS categories for ultrasound images using multi-scale neural networks with an order-constrained loss function

Unsupervised Image-to-Image Translation

Drslmarkov - Distributionally Robust Structure Learning for Discrete Pairwise Markov Networks

Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition (AGRA, ACM 2020, Oral)

Controlling Hill Climb Racing with Hand Tacking

NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch

Bayesian optimization in PyTorch

Hitters Linear Regression - Hitters Linear Regression With Python

Research on Event Accumulator Settings for Event-Based SLAM

Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021

Red Team tool for exfiltrating files from a target's Google Drive that you have access to, via Google's API.

Genetic feature selection module for scikit-learn

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

The implementation of PEMP in paper "Prior-Enhanced Few-Shot Segmentation with Meta-Prototypes"

Code and data of the EMNLP 2021 paper "Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer"

Can we do Customers Segmentation using PHP and Unsupervized Machine Learning ? Yes we can ! 🤡

This script runs neural style transfer against the provided content image.

Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

NuPIC Studio is an all-in-one tool that allows users create a HTM neural network from scratch