[CVPR 2022] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

Overview

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing (CVPR 2022)

teaser

This repository provides the official PyTorch implementation for the following paper:

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing
Yanbo Xu*, Yueqin Yin*, Liming Jiang, Qianyi Wu, Chengyao Zheng, Chen Change Loy, Bo Dai, Wayne Wu
In CVPR 2022. (* denotes equal contribution)
Project Page | Paper

Abstract: Recent advances like StyleGAN have promoted the growth of controllable facial editing. To address its core challenge of attribute decoupling in a single latent space, attempts have been made to adopt dual-space GAN for better disentanglement of style and content representations. Nonetheless, these methods are still incompetent to obtain plausible editing results with high controllability, especially for complicated attributes. In this study, we highlight the importance of interaction in a dual-space GAN for more controllable editing. We propose TransEditor, a novel Transformer-based framework to enhance such interaction. Besides, we develop a new dual-space editing and inversion strategy to provide additional editing flexibility. Extensive experiments demonstrate the superiority of the proposed framework in image quality and editing capability, suggesting the effectiveness of TransEditor for highly controllable facial editing.

Requirements

A suitable Anaconda environment named transeditor can be created and activated with:

conda env create -f environment.yaml
conda activate transeditor

Dataset Preparation

Datasets CelebA-HQ Flickr-Faces-HQ (FFHQ)
  • You can use download.sh in StyleMapGAN to download the CelebA-HQ dataset raw images and create the LMDB dataset format, similar for the FFHQ dataset.

Download Pretrained Models

  • The pretrained models can be downloaded from TransEditor Pretrained Models.
  • The age classifier and gender classifier for the FFHQ dataset can be found at pytorch-DEX.
  • The out/ folder and psp_out/ folder should be put under the TransEditor/ root folder, the pth/ folder should be put under the TransEditor/our_interfaceGAN/ffhq_utils/dex folder.

Training New Networks

To train the TransEditor network, run

python train_spatial_query.py $DATA_DIR --exp_name $EXP_NAME --batch 16 --n_sample 64 --num_region 1 --num_trans 8

For the multi-gpu distributed training, run

python -m torch.distributed.launch --nproc_per_node=$GPU_NUM --master_port $PORT_NUM train_spatial_query.py $DATA_DIR --exp_name $EXP_NAME --batch 16 --n_sample 64 --num_region 1 --num_trans 8

To train the encoder-based inversion network, run

# FFHQ
python psp_spatial_train.py $FFHQ_DATA_DIR --test_path $FFHQ_TEST_DIR --ckpt .out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --start_from_latent_avg --exp_dir $INVERSION_EXP_NAME --from_plus_space 

# CelebA-HQ
python psp_spatial_train.py $CELEBA_DATA_DIR --test_path $CELEBA_TEST_DIR --ckpt ./out/transeditor_celeba/checkpoint/370000.pt --num_region 1 --num_trans 8 --start_from_latent_avg --exp_dir $INVERSION_EXP_NAME --from_plus_space 

Testing (Image Generation/Interpolation)

# sampled image generation
python test_spatial_query.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --sample

# interpolation
python test_spatial_query.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --dat_interp

Inversion

We provide two kinds of inversion methods.

Encoder-based inversion

# FFHQ
python dual_space_encoder_test.py --checkpoint_path ./psp_out/transeditor_inversion_ffhq/checkpoints/best_model.pt --output_dir ./projection --num_region 1 --num_trans 8 --start_from_latent_avg --from_plus_space --dataset_type ffhq_encode --dataset_dir /dataset/ffhq/test/images

# CelebA-HQ
python dual_space_encoder_test.py --checkpoint_path ./psp_out/transeditor_inversion_celeba/checkpoints/best_model.pt --output_dir ./projection --num_region 1 --num_trans 8 --start_from_latent_avg --from_plus_space --dataset_type celebahq_encode --dataset_dir /dataset/celeba_hq/test/images

Optimization-based inversion

# FFHQ
python projector_optimization.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --dataset_dir /dataset/ffhq/test/images --step 10000

# CelebA-HQ
python projector_optimization.py --ckpt ./out/transeditor_celeba/checkpoint/370000.pt --num_region 1 --num_trans 8 --dataset_dir /dataset/celeba_hq/test/images --step 10000

Image Editing

  • The attribute classifiers for CelebA-HQ datasets can be found in celebahq-classifiers.
  • Rename the folder as pth_celeba and put it under the our_interfaceGAN/celeba_utils/ folder.
CelebA_Attributes attribute_index
Male 0
Smiling 1
Wavy hair 3
Bald 8
Bangs 9
Black hair 12
Blond hair 13

For sampled image editing, run

# FFHQ
python our_interfaceGAN/edit_all_noinversion_ffhq.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --attribute_name pose --num_sample 150000 # pose
python our_interfaceGAN/edit_all_noinversion_ffhq.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --attribute_name gender --num_sample 150000 # gender

# CelebA-HQ
python our_interfaceGAN/edit_all_noinversion_celebahq.py --ckpt ./out/transeditor_celeba/checkpoint/370000.pt --attribute_index 0 --num_sample 150000 # Male
python our_interfaceGAN/edit_all_noinversion_celebahq.py --ckpt ./out/transeditor_celeba/checkpoint/370000.pt --attribute_index 3 --num_sample 150000 # wavy hair
python our_interfaceGAN/edit_all_noinversion_celebahq.py --ckpt ./out/transeditor_celeba/checkpoint/370000.pt --attribute_name pose --num_sample 150000 # pose

For real image editing, run

# FFHQ
python our_interfaceGAN/edit_all_inversion_ffhq.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --attribute_name pose --z_latent ./projection/encoder_inversion/ffhq_encode/encoded_z.npy --p_latent ./projection/encoder_inversion/ffhq_encode/encoded_p.npy # pose

python our_interfaceGAN/edit_all_inversion_ffhq.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --attribute_name gender --z_latent ./projection/encoder_inversion/ffhq_encode/encoded_z.npy --p_latent ./projection/encoder_inversion/ffhq_encode/encoded_p.npy # gender

# CelebA-HQ
python our_interfaceGAN/edit_all_inversion_celebahq.py --ckpt ./out/transeditor_celeba/checkpoint/370000.pt --attribute_index 0 --z_latent ./projection/encoder_inversion/celebahq_encode/encoded_z.npy --p_latent ./projection/encoder_inversion/celebahq_encode/encoded_p.npy # Male

Evaluation Metrics

# calculate fid, lpips, ppl
python metrics/evaluate_query.py --ckpt ./out/transeditor_ffhq/checkpoint/790000.pt --num_region 1 --num_trans 8 --batch 64 --inception metrics/inception_ffhq.pkl --truncation 1 --ppl --lpips --fid

Results

Image Interpolation

interp_p_celeba

interp_p_celeba

interp_z_celeba

interp_z_celeba

Image Editing

edit_pose_ffhq

edit_ffhq_pose

edit_gender_ffhq

edit_ffhq_gender

edit_smile_celebahq

edit_celebahq_smile

edit_blackhair_celebahq

edit_blackhair_celebahq

Citation

If you find this work useful for your research, please cite our paper:

@inproceedings{xu2022transeditor,
  title={{TransEditor}: Transformer-Based Dual-Space {GAN} for Highly Controllable Facial Editing},
  author={Xu, Yanbo and Yin, Yueqin and Jiang, Liming and Wu, Qianyi and Zheng, Chengyao and Loy, Chen Change and Dai, Bo and Wu, Wayne},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2022}
}

Acknowledgments

The code is developed based on TransStyleGAN. We appreciate the nice PyTorch implementation.

Owner
Billy XU
Billy XU
Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma 🔥 News 2021-10

Jingtao Zhan 99 Dec 27, 2022
Few-Shot-Intent-Detection includes popular challenging intent detection datasets with/without OOS queries and state-of-the-art baselines and results.

Few-Shot-Intent-Detection Few-Shot-Intent-Detection is a repository designed for few-shot intent detection with/without Out-of-Scope (OOS) intents. It

Jian-Guo Zhang 73 Dec 26, 2022
Analysis code and Latex source of the manuscript describing the conditional permutation test of confounding bias in predictive modelling.

Git repositoty of the manuscript entitled Statistical quantification of confounding bias in predictive modelling by Tamas Spisak The manuscript descri

PNI - Predictive Neuroimaging Lab, University Hospital Essen, Germany 0 Nov 22, 2021
3D-Reconstruction 基于深度学习方法的单目多视图三维重建

基于深度学习方法的单目多视图三维重建 Part I 三维重建 代码:Part1 技术文档:[Markdown] [PDF] 原始图像:Original Images 点云结果:Point Cloud Results-1

HMT_Curo 19 Dec 26, 2022
Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

Y. Dong 158 Dec 21, 2022
A note taker for NVDA. Allows the user to create, edit, view, manage and export notes to different formats.

Quick Notetaker add-on for NVDA The Quick Notetaker add-on is a wonderful tool which allows writing notes quickly and easily anytime and from any app

5 Dec 06, 2022
[KDD 2021, Research Track] DiffMG: Differentiable Meta Graph Search for Heterogeneous Graph Neural Networks

DiffMG This repository contains the code for our KDD 2021 Research Track paper: DiffMG: Differentiable Meta Graph Search for Heterogeneous Graph Neura

AutoML Research 24 Nov 29, 2022
A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.

Visdom A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Python. Overview Concepts Setup Usage API To

FOSSASIA 9.4k Jan 07, 2023
A project which aims to protect your privacy using inexpensive hardware and easily modifiable software

Protecting your privacy using an ESP32, an IR sensor and a python script This project, which I personally call the "never-gonna-catch-me-in-the-act-ev

8 Oct 10, 2022
Code for testing convergence rates of Lipschitz learning on graphs

📈 LipschitzLearningRates The code in this repository reproduces the experimental results on convergence rates for k-nearest neighbor graph infinity L

2 Dec 20, 2021
AFLFast (extends AFL with Power Schedules)

AFLFast Power schedules implemented by Marcel Böhme [email protected]

Marcel Böhme 380 Jan 03, 2023
VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

What's New Below we share, in reverse chronological order, the updates and new releases in VISSL. All VISSL releases are available here. [Oct 2021]: V

Meta Research 2.9k Jan 07, 2023
Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord.

numpy2tfrecord Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord. Installation

Ryo Yonetani 2 Jan 16, 2022
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.

Swin Transformer for Semantic Segmentation of satellite images This repo contains the supported code and configuration files to reproduce semantic seg

23 Oct 10, 2022
PyTorch implementation for 3D human pose estimation

Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach This repository is the PyTorch implementation for the network presented in:

Xingyi Zhou 579 Dec 22, 2022
BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

Pre-trained checkpoint and bert config json file Location of checkpoint and bert config json file This MLCommons members Google Drive location contain

SAIT (Samsung Advanced Institute of Technology) 12 Apr 27, 2022
Traditional deepdream with VQGAN+CLIP and optical flow. Ready to use in Google Colab

VQGAN-CLIP-Video cat.mp4 policeman.mp4 schoolboy.mp4 forsenBOG.mp4

23 Oct 26, 2022
Wordplay, an artificial Intelligence based crossword puzzle solver.

Wordplay, AI based crossword puzzle solver A crossword is a word puzzle that usually takes the form of a square or a rectangular grid of white- and bl

Vaibhaw 4 Nov 16, 2022
Official implementation of EfficientPose

EfficientPose This is the official implementation of EfficientPose. We based our work on the Keras EfficientDet implementation xuannianz/EfficientDet

2 May 17, 2022