Invertible conditional GANs for image editing

Related tags

Deep LearningIcGAN
Overview

Invertible Conditional GANs

A real image is encoded into a latent representation z and conditional information y, and then decoded into a new image. We fix z for every row, and modify y for each column to obtain variations in real samples.

This is the implementation of the IcGAN model proposed in our paper:

Invertible Conditional GANs for image editing. November 2016.

This paper is a summarized and updated version of my master thesis, which you can find here:

Master thesis: Invertible Conditional Generative Adversarial Networks. September 2016.

The baseline used is the Torch implementation of the DCGAN by Radford et al.

  1. Training the model
    1. Face dataset: CelebA
    2. Digit dataset: MNIST
  2. Visualize the results
    1. Reconstruct and modify real images
    2. Swap attributes
    3. Interpolate between faces

Requisites

Please refer to DCGAN torch repository to know the requirements and dependencies to run the code. Additionally, you will need to install the threads and optnet package:

luarocks install threads

luarocks install optnet

In order to interactively display the results, follow these steps.

1. Training the model

Model overview

The IcGAN is trained in four steps.

  1. Train the generator.
  2. Create a dataset of generated images with the generator.
  3. Train the encoder Z to map an image x to a latent representation z with the dataset generated images.
  4. Train the encoder Y to map an image x to a conditional information vector y with the dataset of real images.

All the parameters of the training phase are located in cfg/mainConfig.lua.

There is already a pre-trained model for CelebA available in case you want to skip the training part. Here you can find instructions on how to use it.

1.1 Train with a face dataset: CelebA

Note: for speed purposes, the whole dataset will be loaded into RAM during training time, which requires about 10 GB of RAM. Therefore, 12 GB of RAM is a minimum requirement. Also, the dataset will be stored as a tensor to load it faster, make sure that you have around 25 GB of free space.

Preprocess

mkdir celebA; cd celebA

Download img_align_celeba.zip here under the link "Align&Cropped Images". Also, you will need to download list_attr_celeba.txt from the same link, which is found under Anno folder.

unzip img_align_celeba.zip; cd ..
DATA_ROOT=celebA th data/preprocess_celebA.lua

Now move list_attr_celeba.txt to celebA folder.

mv list_attr_celeba.txt celebA

Training

  • Conditional GAN: parameters are already configured to run CelebA (dataset=celebA, dataRoot=celebA).

     th trainGAN.lua
  • Generate encoder dataset:

     net=[GENERATOR_PATH] outputFolder=celebA/genDataset/ samples=182638 th data/generateEncoderDataset.lua

    (GENERATOR_PATH example: checkpoints/celebA_25_net_G.t7)

  • Train encoder Z:

     datasetPath=celebA/genDataset/ type=Z th trainEncoder.lua
    
  • Train encoder Y:

     datasetPath=celebA/ type=Y th trainEncoder.lua
    

1.2 Train with a digit dataset: MNIST

Preprocess

Download MNIST as a luarocks package: luarocks install mnist

Training

  • Conditional GAN:

     name=mnist dataset=mnist dataRoot=mnist th trainGAN.lua
  • Generate encoder dataset:

     net=[GENERATOR_PATH] outputFolder=mnist/genDataset/ samples=60000 th data/generateEncoderDataset.lua

    (GENERATOR_PATH example: checkpoints/mnist_25_net_G.t7)

  • Train encoder Z:

     datasetPath=mnist/genDataset/ type=Z th trainEncoder.lua
    
  • Train encoder Y:

     datasetPath=mnist type=Y th trainEncoder.lua
    

2 Pre-trained CelebA model:

CelebA model is available for download here. The file includes the generator and both encoders (encoder Z and encoder Y).

3. Visualize the results

For visualizing the results you will need an already trained IcGAN (i.e. a generator and two encoders). The parameters for generating results are in cfg/generateConfig.lua.

3.1 Reconstruct and modify real images

Reconstrucion example

decNet=celeba_24_G.t7 encZnet=celeba_encZ_7.t7 encYnet=celeba_encY_5.t7 loadPath=[PATH_TO_REAL_IMAGES] th generation/reconstructWithVariations.lua

3.2 Swap attributes

Swap attributes

Swap the attribute information between two pairs of faces.

decNet=celeba_24_G.t7 encZnet=celeba_encZ_7.t7 encYnet=celeba_encY_5.t7 im1Path=[IM1] im2Path=[IM2] th generation/attributeTransfer.lua

3.3 Interpolate between faces

Interpolation

decNet=celeba_24_G.t7 encZnet=celeba_encZ_7.t7 encYnet=celeba_encY_5.t7 im1Path=[IM1] im2Path=[IM2] th generation/interpolate.lua

Do you like or use our work? Please cite us as

@inproceedings{Perarnau2016,
  author    = {Guim Perarnau and
               Joost van de Weijer and
               Bogdan Raducanu and
               Jose M. \'Alvarez},
  title     = {{Invertible Conditional GANs for image editing}},
  booktitle   = {NIPS Workshop on Adversarial Training},
  year      = {2016},
}
Owner
Guim
Guim
This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision"

RUAS This is the official code for the paper "Learning with Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision" A prelimin

Vision & Optimization Group (VOG) 2 May 05, 2022
Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

openpifpaf Continuously tested on Linux, MacOS and Windows: New 2021 paper: OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Te

VITA lab at EPFL 50 Dec 29, 2022
Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition This is a Torch implementation of "Deep Residual Learning for Image Recognition",Kaiming He, Xiangyu Zhan

Kimmy 561 Dec 01, 2022
Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

ARAE Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun https://arxiv.org/abs/1706.04223 Disc

Junbo (Jake) Zhao 399 Jan 02, 2023
Video Swin Transformer - PyTorch

Video-Swin-Transformer-Pytorch This repo is a simple usage of the official implementation "Video Swin Transformer". Introduction Video Swin Transforme

Haofan Wang 116 Dec 20, 2022
A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

bbc-speech-segmenter: Voice Activity Detection & Speaker Diarization A complete speech segmentation system using Kaldi and x-vectors for voice activit

BBC 16 Oct 27, 2022
Tutorial page of the Climate Hack, the greatest hackathon ever

Tutorial page of the Climate Hack, the greatest hackathon ever

UCL Artificial Intelligence Society 12 Jul 02, 2022
Object detection and instance segmentation toolkit based on PaddlePaddle.

Object detection and instance segmentation toolkit based on PaddlePaddle.

9.3k Jan 02, 2023
Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Event Sourced Bank A "wide but shallow" example of using the Python event sourci

3 Mar 09, 2022
Scalable Optical Flow-based Image Montaging and Alignment

SOFIMA SOFIMA (Scalable Optical Flow-based Image Montaging and Alignment) is a tool for stitching, aligning and warping large 2d, 3d and 4d microscopy

Google Research 16 Dec 21, 2022
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

Sunbow Liu 22 Nov 25, 2022
A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Tensorpack is a neural network training interface based on TensorFlow. Features: It's Yet Another TF high-level API, with speed, and flexibility built

Tensorpack 6.2k Jan 09, 2023
以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的斗地主ai

ddz-ai 介绍 斗地主是一种扑克游戏。游戏最少由3个玩家进行,用一副54张牌(连鬼牌),其中一方为地主,其余两家为另一方,双方对战,先出完牌的一方获胜。 ddz-ai以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的系统,使其经过大量训练后,能在实际游戏中获

freefuiiismyname 88 May 15, 2022
Unofficial JAX implementations of Deep Learning models

JAX Models Table of Contents About The Project Getting Started Prerequisites Installation Usage Contributing License Contact About The Project The JAX

107 Jan 05, 2023
Display, filter and search log messages in your terminal

Textualog Display, filter and search logging messages in the terminal. This project is powered by rich and textual. Some of the ideas and code in this

Rik Huygen 24 Dec 10, 2022
A PaddlePaddle version image model zoo.

Paddle-Image-Models English | 简体中文 A PaddlePaddle version image model zoo. Install Package Install by pip: $ pip install ppim Install by wheel package

AgentMaker 131 Dec 07, 2022
The reference baseline of final exam for XMU machine learning course

Mini-NICO Baseline The baseline is a reference method for the final exam of machine learning course. Requirements Installation we use /python3.7 /torc

JoaquinChou 3 Dec 29, 2021
Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

snc4onnx Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX. https://github.com/PINTO0309/simple-onnx-processing-tools 1.

Katsuya Hyodo 8 Oct 13, 2022
The code succinctly shows how our ensemble learning based on deep learning CNN is used for LAM-avulsion-diagnosis.

deep-learning-LAM-avulsion-diagnosis The code succinctly shows how our ensemble learning based on deep learning CNN is used for LAM-avulsion-diagnosis

1 Jan 12, 2022
PyTorch implementation of Graph Convolutional Networks in Feature Space for Image Deblurring and Super-resolution, IJCNN 2021.

GCResNet PyTorch implementation of Graph Convolutional Networks in Feature Space for Image Deblurring and Super-resolution, IJCNN 2021. The code will

11 May 19, 2022