Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Overview

Talk-to-Edit (ICCV2021)

Python 3.7 pytorch 1.6.0

This repository contains the implementation of the following paper:

Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu
IEEE International Conference on Computer Vision (ICCV), 2021

[Paper] [Project Page] [CelebA-Dialog Dataset]

Overview

overall_structure

Dependencies and Installation

  1. Clone Repo

    git clone [email protected]:yumingj/Talk-to-Edit.git
  2. Create Conda Environment and Install Dependencies

    conda env create -f environment.yml
    conda activate talk_edit
    • Python >= 3.7
    • PyTorch >= 1.6
    • CUDA 10.1
    • GCC 5.4.0

Get Started

Editing

We provide scripts for editing using our pretrained models.

  1. First, download the pretrained models from this link and put them under ./download/pretrained_models as follows:

    ./download/pretrained_models
    ├── 1024_field
    │   ├── Bangs.pth
    │   ├── Eyeglasses.pth
    │   ├── No_Beard.pth
    │   ├── Smiling.pth
    │   └── Young.pth
    ├── 128_field
    │   ├── Bangs.pth
    │   ├── Eyeglasses.pth
    │   ├── No_Beard.pth
    │   ├── Smiling.pth
    │   └── Young.pth
    ├── arcface_resnet18_110.pth
    ├── language_encoder.pth.tar
    ├── predictor_1024.pth.tar
    ├── predictor_128.pth.tar
    ├── stylegan2_1024.pth
    ├── stylegan2_128.pt
    ├── StyleGAN2_FFHQ1024_discriminator.pth
    └── eval_predictor.pth.tar
    
  2. You can try pure image editing without dialog instructions:

    python editing_wo_dialog.py \
       --opt ./configs/editing/editing_wo_dialog.yml \
       --attr 'Bangs' \
       --target_val 5

    The editing results will be saved in ./results.

    You can change attr to one of the following attributes: Bangs, Eyeglasses, Beard, Smiling, and Young(i.e. Age). And the target_val can be [0, 1, 2, 3, 4, 5].

  3. You can also try dialog-based editing, where you talk to the system through the command prompt:

    python editing_with_dialog.py --opt ./configs/editing/editing_with_dialog.yml

    The editing results will be saved in ./results.

    How to talk to the system:

    • Our system is able to edit five facial attributes: Bangs, Eyeglasses, Beard, Smiling, and Young(i.e. Age).
    • When prompted with "Enter your request (Press enter when you finish):", you can enter an editing request about one of the five attributes. For example, you can say "Make the bangs longer."
    • To respond to the system's feedback, just talk as if you were talking to a real person. For example, if the system asks "Is the length of the bangs just right?" after one round of editing, You can say things like "Yes." / "No." / "Yes, and I also want her to smile more happily.".
    • To end the conversation, just tell the system things like "That's all" / "Nothing else, thank you."
  4. By default, the above editing would be performed on the teaser image. You may change the image to be edited in two ways: 1) change line 11: latent_code_index to other values ranging from 0 to 99; 2) set line 10: latent_code_path to ~, so that an image would be randomly generated.

  5. If you want to try editing on real images, you may download the real images from this link and put them under ./download/real_images. You could also provide other real images at your choice. You need to change line 12: img_path in editing_with_dialog.yml or editing_wo_dialog.yml according to the path to the real image and set line 11: is_real_image as True.

  6. You can switch the default image size to 128 x 128 by setting line 3: img_res to 128 in config files.

Train the Semantic Field

  1. To train the Semantic Field, a number of sampled latent codes should be prepared and then we use the attribute predictor to predict the facial attributes for their corresponding images. The attribute predictor is trained using fine-grained annotations in CelebA-Dialog dataset. Here, we provide the latent codes we used. You can download the train data from this link and put them under ./download/train_data as follows:

    ./download/train_data
    ├── 1024
    │   ├── Bangs
    │   ├── Eyeglasses
    │   ├── No_Beard
    │   ├── Smiling
    │   └── Young
    └── 128
        ├── Bangs
        ├── Eyeglasses
        ├── No_Beard
        ├── Smiling
        └── Young
    
  2. We will also use some editing latent codes to monitor the training phase. You can download the editing latent code from this link and put them under ./download/editing_data as follows:

    ./download/editing_data
    ├── 1024
    │   ├── Bangs.npz.npy
    │   ├── Eyeglasses.npz.npy
    │   ├── No_Beard.npz.npy
    │   ├── Smiling.npz.npy
    │   └── Young.npz.npy
    └── 128
        ├── Bangs.npz.npy
        ├── Eyeglasses.npz.npy
        ├── No_Beard.npz.npy
        ├── Smiling.npz.npy
        └── Young.npz.npy
    
  3. All logging files in the training process, e.g., log message, checkpoints, and snapshots, will be saved to ./experiments and ./tb_logger directory.

  4. There are 10 configuration files under ./configs/train, named in the format of field_<IMAGE_RESOLUTION>_<ATTRIBUTE_NAME>. Choose the corresponding configuration file for the attribute and resolution you want.

  5. For example, to train the semantic field which edits the attribute Bangs in 128x128 image resolution, simply run:

    python train.py --opt ./configs/train/field_128_Bangs.yml

Quantitative Results

We provide codes for quantitative results shown in Table 1. Here we use Bangs in 128x128 resolution as an example.

  1. Use the trained semantic field to edit images.

    python editing_quantitative.py \
    --opt ./configs/train/field_128_bangs.yml \
    --pretrained_path ./download/pretrained_models/128_field/Bangs.pth
  2. Evaluate the edited images using quantitative metircs. Change image_num for different attribute accordingly: Bangs: 148, Eyeglasses: 82, Beard: 129, Smiling: 140, Young: 61.

    python quantitative_results.py \
    --attribute Bangs \
    --work_dir ./results/field_128_bangs \
    --image_dir ./results/field_128_bangs/visualization \
    --image_num 148

Qualitative Results

result

CelebA-Dialog Dataset

result

Our CelebA-Dialog Dataset is available at link.

CelebA-Dialog is a large-scale visual-language face dataset with the following features:

  • Facial images are annotated with rich fine-grained labels, which classify one attribute into multiple degrees according to its semantic meaning.
  • Accompanied with each image, there are captions describing the attributes and a user request sample.

result

The dataset can be employed as the training and test sets for the following computer vision tasks: fine-grained facial attribute recognition, fine-grained facial manipulation, text-based facial generation and manipulation, face image captioning, and broader natural language based facial recognition and manipulation tasks.

Citation

If you find our repo useful for your research, please consider citing our paper:

@InProceedings{jiang2021talkedit,
  author = {Jiang, Yuming and Huang, Ziqi and Pan, Xingang and Loy, Chen Change and Liu, Ziwei},
  title = {Talk-to-Edit: Fine-Grained Facial Editing via Dialog},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2021}
}

Contact

If you have any question, please feel free to contact us via [email protected] or [email protected].

Acknowledgement

The codebase is maintained by Yuming Jiang and Ziqi Huang.

Part of the code is borrowed from stylegan2-pytorch, IEP and face-attribute-prediction.

Owner
Yuming Jiang
[email protected], Ph.D. Student
Yuming Jiang
🕺Full body detection and tracking

Pose-Detection 🤔 Overview Human pose estimation from video plays a critical role in various applications such as quantifying physical exercises, sign

Abbas Ataei 20 Nov 21, 2022
DrWhy is the collection of tools for eXplainable AI (XAI). It's based on shared principles and simple grammar for exploration, explanation and visualisation of predictive models.

Responsible Machine Learning With Great Power Comes Great Responsibility. Voltaire (well, maybe) How to develop machine learning models in a responsib

Model Oriented 590 Dec 26, 2022
Quickly and easily create / train a custom DeepDream model

Dream-Creator This project aims to simplify the process of creating a custom DeepDream model by using pretrained GoogleNet models and custom image dat

55 Dec 27, 2022
MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark We propose a benchmark to evaluate different quantization algorithms on vari

494 Dec 29, 2022
Contextual Attention Localization for Offline Handwritten Text Recognition

CALText This repository contains the source code for CALText model introduced in "CALText: Contextual Attention Localization for Offline Handwritten T

0 Feb 17, 2022
Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch

Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch Reference Paper URL Author: Yi Tay, Dara Bahri, Donald Metzler

Myeongjun Kim 66 Nov 30, 2022
5 Jan 05, 2023
Epidemiology analysis package

zEpid zEpid is an epidemiology analysis package, providing easy to use tools for epidemiologists coding in Python 3.5+. The purpose of this library is

Paul Zivich 111 Jan 08, 2023
ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

Bats Research 94 Nov 21, 2022
Advantage Actor Critic (A2C): jax + flax implementation

Advantage Actor Critic (A2C): jax + flax implementation Current version supports only environments with continious action spaces and was tested on muj

Andrey 3 Jan 23, 2022
Language Models Can See: Plugging Visual Controls in Text Generation

Language Models Can See: Plugging Visual Controls in Text Generation Authors: Yixuan Su, Tian Lan, Yahui Liu, Fangyu Liu, Dani Yogatama, Yan Wang, Lin

Yixuan Su 195 Dec 22, 2022
A library of scripts that interact with the PythonTurtle module to create games, drawings, and more

TurtleLib TurtleLib is a library of scripts that interact with the PythonTurtle module to create games, drawings, and more! Using the Scripts Copy or

1 Jan 15, 2022
Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

scc4onnx Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel

Katsuya Hyodo 16 Dec 22, 2022
Official repository for Hierarchical Opacity Propagation for Image Matting

HOP-Matting Official repository for Hierarchical Opacity Propagation for Image Matting 🚧 🚧 🚧 Under Construction 🚧 🚧 🚧 🚧 🚧 🚧   Coming Soon   

Li Yaoyi 54 Dec 30, 2021
Python parser for DTED data.

DTED Parser This is a package written in pure python (with help from numpy) to parse and investigate Digital Terrain Elevation Data (DTED) files. This

Ben Bonenfant 12 Dec 18, 2022
Resources for our AAAI 2022 paper: "LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification".

LOREN Resources for our AAAI 2022 paper (pre-print): "LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification". DEMO System Check out o

Jiangjie Chen 37 Dec 27, 2022
CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing

CapsuleVOS This is the code for the ICCV 2019 paper CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing. Arxiv Link: https://a

53 Oct 27, 2022
MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity Introduction The 3D LiDAR place recognition aim

16 Dec 08, 2022
Official code release for: EditGAN: High-Precision Semantic Image Editing

Official code release for: EditGAN: High-Precision Semantic Image Editing

565 Jan 05, 2023
Tensorflow-seq2seq-tutorials - Dynamic seq2seq in TensorFlow, step by step

seq2seq with TensorFlow Collection of unfinished tutorials. May be good for educational purposes. 1 - simple sequence-to-sequence model with dynamic u

Matvey Ezhov 1k Dec 17, 2022