Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"

Overview

MotionCLIP

Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space".

Please visit our webpage for more details.

teaser

Bibtex

If you find this code useful in your research, please cite:

@article{tevet2022motionclip,
title={MotionCLIP: Exposing Human Motion Generation to CLIP Space},
author={Tevet, Guy and Gordon, Brian and Hertz, Amir and Bermano, Amit H and Cohen-Or, Daniel},
journal={arXiv preprint arXiv:2203.08063},
year={2022}
}

Getting started

1. Create conda environment

conda env create -f environment.yml
conda activate motionclip

The code was tested on Python 3.8 and PyTorch 1.8.1.

2. Download data

Download and unzip the above datasets and place them correspondingly:

  • AMASS -> ./data/amass (Download the SMPL+H version for each dataset separately, please note to download ALL the dataset in AMASS website)
  • BABEL -> ./data/babel_v1.0_release
  • Rendered AMASS images -> ./data/render

3. Download the SMPL body model

bash prepare/download_smpl_files.sh

This will download the SMPL neutral model from this github repo and additionnal files.

In addition, download the Extended SMPL+H model (used in AMASS project) from MANO, and place it in ./models/smplh.

4. Parse data

Process the three datasets into a unified dataset with (text, image, motion) triplets.

To parse acording to the AMASS split (for all applications except action recognition), run:

python -m src.datasets.amass_parser --dataset_name amass

Only if you intend to use Action Recognition, run also:

python -m src.datasets.amass_parser --dataset_name babel

Using the pretrained model

First, download the model and place it at ./exps/paper-model

1. Text-to-Motion

To reproduce paper results, run:

 python -m src.visualize.text2motion ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_texts.txt

To run MotionCLIP with your own texts, create a text file, with each line depicts a different text input (see paper_texts.txt as a reference) and point to it with --input_file instead.

2. Vector Editing

To reproduce paper results, run:

 python -m src.visualize.motion_editing ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_edits.csv

To gain the input motions, we support two modes:

  • data - Retrieve motions from train/validation sets, according to their textual label. On it first run, src.visualize.motion_editing generates a file containing a list of all textual labels. You can look it up and choose motions for your own editing.
  • text - The inputs are free texts, instead of motions. We use CLIP text encoder to get CLIP representations, perform vector editing, then use MotionCLIP decoder to output the edited motion.

To run MotionCLIP on your own editing, create a csv file, with each line depicts a different edit (see paper_edits.csv as a reference) and point to it with --input_file instead.

3. Interpolation

To reproduce paper results, run:

 python -m src.visualize.motion_interpolation ./exps/paper-model/checkpoint_0100.pth.tar --input_file assets/paper_interps.csv

To gain the input motions, we use the data mode described earlier.

To run MotionCLIP on your own interpolations, create a csv file, with each line depicts a different interpolation (see paper_interps.csv as a reference) and point to it with --input_file instead.

4. Action Recognition

For action recognition, we use a model trained on text class names. Download and place it at ./exps/classes-model.

python -m src.utils.action_classifier ./exps/classes-model/checkpoint_0200.pth.tar

Train your own

NOTE (11/MAY/22): The paper model is not perfectly reproduced using this code. We are working to resolve this issue. The trained model checkpoint we provide does reproduce results.

To reproduce paper-model run:

python -m src.train.train --clip_text_losses cosine --clip_image_losses cosine --pose_rep rot6d \
--lambda_vel 100 --lambda_rc 100 --lambda_rcxyz 100 \
--jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8 \
--lr 0.0001 --glob --translation --no-vertstrans --latent_dim 512 --num_epochs 500 --snapshot 10 \
--device <GPU DEVICE ID> \
--datapath ./data/amass_db/amass_30fps_db.pt \
--folder ./exps/my-paper-model

To reproduce classes-model run:

python -m src.train.train --clip_text_losses cosine --clip_image_losses cosine --pose_rep rot6d \
--lambda_vel 95 --lambda_rc 95 --lambda_rcxyz 95 \
--jointstype vertices --batch_size 20 --num_frames 60 --num_layers 8 \
--lr 0.0001 --glob --translation --no-vertstrans --latent_dim 512 --num_epochs 500 --snapshot 10 \
--device <GPU DEVICE ID> \
--datapath ./data/amass_db/babel_30fps_db.pt \
--folder ./exps/my-classes-model

Acknowledgment

The code of the transformer model and the dataloader are based on ACTOR repository.

License

This code is distributed under an MIT LICENSE.

Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.

Owner
Guy Tevet
CS PhD student
Guy Tevet
Collection of Docker images for ML/DL and video processing projects

Collection of Docker images for ML/DL and video processing projects. Overview of images Three types of images differ by tag postfix: base: Python with

OSAI 87 Nov 22, 2022
SegNet-like Autoencoders in TensorFlow

SegNet SegNet is a TensorFlow implementation of the segmentation network proposed by Kendall et al., with cool features like strided deconvolution, a

Andrea Azzini 66 Nov 05, 2021
BRNet - code for Automated assessment of BI-RADS categories for ultrasound images using multi-scale neural networks with an order-constrained loss function

BRNet code for "Automated assessment of BI-RADS categories for ultrasound images using multi-scale neural networks with an order-constrained loss func

Yong Pi 2 Mar 09, 2022
Unsupervised Image-to-Image Translation

UNIT: UNsupervised Image-to-image Translation Networks Imaginaire Repository We have a reimplementation of the UNIT method that is more performant. It

Ming-Yu Liu 劉洺堉 1.9k Dec 26, 2022
Drslmarkov - Distributionally Robust Structure Learning for Discrete Pairwise Markov Networks

Distributionally Robust Structure Learning for Discrete Pairwise Markov Networks

1 Nov 24, 2022
Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition (AGRA, ACM 2020, Oral)

Cross Domain Facial Expression Recognition Benchmark Implementation of papers: Cross-Domain Facial Expression Recognition: A Unified Evaluation Benchm

89 Dec 09, 2022
Controlling Hill Climb Racing with Hand Tacking

Controlling Hill Climb Racing with Hand Tacking Opened Palm for Gas Closed Palm for Brake

Rohit Ingole 3 Jan 18, 2022
NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch

NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch, train it, collect statistics, and share it among the members of the community. It is not just a visual

HTM Community 93 Sep 30, 2022
Bayesian optimization in PyTorch

BoTorch is a library for Bayesian Optimization built on PyTorch. BoTorch is currently in beta and under active development! Why BoTorch ? BoTorch Prov

2.5k Dec 31, 2022
Hitters Linear Regression - Hitters Linear Regression With Python

Hitters_Linear_Regression Kullanacağımız veri seti Carnegie Mellon Üniversitesi'

AyseBuyukcelik 2 Jan 26, 2022
Research on Event Accumulator Settings for Event-Based SLAM

Research on Event Accumulator Settings for Event-Based SLAM This is the source code for paper "Research on Event Accumulator Settings for Event-Based

Robin Shaun 26 Dec 21, 2022
Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021

S2HAND: Model-based 3D Hand Reconstruction via Self-Supervised Learning S2HAND presents a self-supervised 3D hand reconstruction network that can join

Yujin Chen 72 Dec 12, 2022
Red Team tool for exfiltrating files from a target's Google Drive that you have access to, via Google's API.

GD-Thief Red Team tool for exfiltrating files from a target's Google Drive that you(the attacker) has access to, via the Google Drive API. This includ

Antonio Piazza 39 Dec 27, 2022
Genetic feature selection module for scikit-learn

sklearn-genetic Genetic feature selection module for scikit-learn Genetic algorithms mimic the process of natural selection to search for optimal valu

Manuel Calzolari 260 Dec 14, 2022
Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Talk-to-Edit (ICCV2021) This repository contains the implementation of the following paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog Yumin

Yuming Jiang 221 Jan 07, 2023
The implementation of PEMP in paper "Prior-Enhanced Few-Shot Segmentation with Meta-Prototypes"

Prior-Enhanced network with Meta-Prototypes (PEMP) This is the PyTorch implementation of PEMP. Overview of PEMP Meta-Prototypes & Adaptive Prototypes

Jianwei ZHANG 8 Oct 14, 2021
Code and data of the EMNLP 2021 paper "Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer"

StyleAttack Code and data of the EMNLP 2021 paper "Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer" Prepare Pois

THUNLP 19 Nov 20, 2022
Can we do Customers Segmentation using PHP and Unsupervized Machine Learning ? Yes we can ! 🤡

Customers Segmentation using PHP and Rubix ML PHP Library Can we do Customers Segmentation using PHP and Unsupervized Machine Learning ? Yes we can !

Mickaël Andrieu 11 Oct 08, 2022
This script runs neural style transfer against the provided content image.

Neural Style Transfer Content Style Output Description: This script runs neural style transfer against the provided content image. The content image m

Martynas Subonis 0 Nov 25, 2021
Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

AttnGAN Pytorch implementation for reproducing AttnGAN results in the paper AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative

Tao Xu 1.2k Dec 26, 2022