Implementation of Diverse Semantic Image Synthesis via Probability Distribution Modeling

Related tags

Deep LearningINADE
Overview

Diverse Semantic Image Synthesis via Probability Distribution Modeling (CVPR 2021)

Architecture

Paper

Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, [Bin Liu], Gang Hua, Nenghai Yu

Abstract

Semantic image synthesis, translating semantic layouts to photo-realistic images, is a one-to-many mapping problem. Though impressive progress has been recently made, diverse semantic synthesis that can efficiently produce semantic-level multimodal results, still remains a challenge. In this paper, we propose a novel diverse semantic image synthesis framework from the perspective of semantic class distributions, which naturally supports diverse generation at semantic or even instance level. We achieve this by modeling class-level conditional modulation parameters as continuous probability distributions instead of discrete values, and sampling per-instance modulation parameters through instance-adaptive stochastic sampling that is consistent across the network. Moreover, we propose prior noise remapping, through linear perturbation parameters encoded from paired references, to facilitate supervised training and exemplar-based instance style control at test time. Extensive experiments on multiple datasets show that our method can achieve superior diversity and comparable quality compared to state-of-the-art methods.

Installation

Clone this repo.

git clone https://github.com/tzt101/INADE.git
cd INADE/

This code requires PyTorch 1.6 and python 3+. Please install dependencies by

pip install -r requirements.txt

Dataset Preparation

The Cityscapes and ADE20K dataset can be downloaded and prepared following SPADE. The CelebAMask-HQ can be downloaded from CelebAMask-HQ, you need to to integrate the separated annotations into an image file (the format like other datasets, e.g. Cityscapes and ADE20K). The DeepFashion can be downloaded from SMIS, and the version with two persons can be downloaded from OneDrive.

To make or reid the instance map, you can use the following commands:

python make_instances.py --path [Path_to_dataset] --dataset [ade20k | cityscapes | celeba | deepfashion]

Generating Images Using Pretrained Model

Once the dataset is ready, the result images can be generated using pretrained models.

  1. Download the pretrained models from the OneDrive, save it in checkpoints/. The structure is as follows:
./checkpoints/
    inade_ade20k/
        best_net_G.pth
        best_net_IE.pth
    inade_celeba/
        best_net_G.pth
        best_net_IE.pth
    inade_cityscapes/
        best_net_G.pth
        best_net_IE.pth
    inade_deepfashion/
        best_net_G.pth
        best_net_IE.pth

The noise_nc is 64 for all pretrained models except which on deepfashion (set to 8). Because we find that it's enough for quality and diversity.

  1. Generate the images on the test dataset.
python test.py --name [model_name] --norm_mode inade --batchSize 1 --gpu_ids 0 --which_epoch best --dataset_mode [dataset] --dataroot [Path_to_dataset]

[model_name] is the directory name of the checkpoint file downloaded in Step 1, such as inade_ade20k and inade_cityscapes. [dataset] can be on of ade20k, celeba, cityscapes and deepfashion. [Path_to_dataset] is the path to the dataset. If you want to use encoder, you can add the another option --use_vae.

Training New Models

You can train your own model with the following command:

# To train CLADE and CLADE-ICPE.
python train.py --name [experiment_name] --dataset_mode [dataset] --norm_mode inade --use_vae --dataroot [Path_to_dataset]

If you want to test the model during the training step, please set --train_eval. By default, the model every 10 epoch will be test in terms of FID. Finally, the model with best FID score will be saved as best_net_G.pth.

Calculate FID

We provide the code to calculate the FID which is based on rpo. We have pre-calculated the distribution of real images (all images are resized to 256×256 except cityscapes is 512×256) in training set of each dataset and saved them in ./datasets/train_mu_si/. You can run the following command:

python fid_score.py [Path_to_real_image] [Path_to_fake_image] --batch-size 1 --gpu 0 --load_np_name [dataset] --resize_size [Size]

The provided [dataset] are: ade20k, celeba, cityscapes, coco and deepfashion. You can save the new dataset by replacing --load_np_name [dataset] with --save_np_name [dataset].

New Useful Options

The new options are as follows:

  • --use_amp: if specified, use AMP training mode.
  • --train_eval: if sepcified, evaluate the model during training.
  • --eval_dims: the default setting is 2048, Dimensionality of Inception features to use.
  • --eval_epoch_freq: the default setting is 10, frequency of calculate fid score at the end of epochs.

Code Structure

  • train.py, test.py: the entry point for training and testing.
  • trainers/pix2pix_trainer.py: harnesses and reports the progress of training.
  • models/pix2pix_model.py: creates the networks, and compute the losses
  • models/networks/: defines the architecture of all models
  • options/: creates option lists using argparse package. More individuals are dynamically added in other files as well. Please see the section below.
  • data/: defines the class for loading images and label maps.

Citation

If you use this code for your research, please cite our papers.

@inproceedings{tan2021diverse,
  title={Diverse Semantic Image Synthesis via Probability Distribution Modeling},
  author={Tan, Zhentao and Chai, Menglei and Chen, Dongdong and Liao, Jing and Chu, Qi and Liu, Bin and Hua, Gang and Yu, Nenghai},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={7962--7971},
  year={2021}
}

Acknowledgments

This code borrows heavily from SPADE.

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Hybrid-Supervised Object Detection System Object detection system trained by hybrid-supervision/weakly semi-supervision (HSOD/WSSOD): This project is

5 Dec 10, 2022
Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Text2Music Emotion Embedding Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings Reference Emotion Embedding Spaces for Matching

Minz Won 50 Dec 05, 2022
kullanışlı ve işinizi kolaylaştıracak bir araç

Hey merhaba! işte çok sorulan sorularının cevabı ve sorunlarının çözümü; Soru= İçinde var denilen birçok şeyi göremiyorum bunun sebebi nedir? Cevap= B

Sexettin 16 Dec 17, 2022
An open source app to help calm you down when needed.

By: Seanpm2001, Et; Al. Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af Afrikaans Afrikaans |

Sean P. Myrick V19.1.7.2 2 Oct 24, 2022
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Multimodal Deep Learning 🎆 🎆 🎆 Announcing the multimodal deep learning repository that contains implementation of various deep learning-based model

Deep Cognition and Language Research (DeCLaRe) Lab 398 Dec 30, 2022
[NeurIPS 2021] Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples | ⛰️⚠️

Towards Better Understanding of Training Certifiably Robust Models against Adversarial Examples This repository is the official implementation of "Tow

Sungyoon Lee 4 Jul 12, 2022
Towards Implicit Text-Guided 3D Shape Generation (CVPR2022)

Towards Implicit Text-Guided 3D Shape Generation Towards Implicit Text-Guided 3D Shape Generation (CVPR2022) Code for the paper [Towards Implicit Text

55 Dec 16, 2022
MILK: Machine Learning Toolkit

MILK: MACHINE LEARNING TOOLKIT Machine Learning in Python Milk is a machine learning toolkit in Python. Its focus is on supervised classification with

Luis Pedro Coelho 610 Dec 14, 2022
Learning Logic Rules for Document-Level Relation Extraction

LogiRE Learning Logic Rules for Document-Level Relation Extraction We propose to introduce logic rules to tackle the challenges of doc-level RE. Equip

41 Dec 26, 2022
Fbone (Flask bone) is a Flask (Python microframework) starter/template/bootstrap/boilerplate application.

Fbone (Flask bone) is a Flask (Python microframework) starter/template/bootstrap/boilerplate application.

Wilson 1.7k Dec 30, 2022
《DeepViT: Towards Deeper Vision Transformer》(2021)

DeepViT This repo is the official implementation of "DeepViT: Towards Deeper Vision Transformer". The repo is based on the timm library (https://githu

109 Dec 02, 2022
PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, wav2lip, picture repair, image editing, photo2cartoon, image style transfer, and so on.

English | 简体中文 PaddleGAN PaddleGAN provides developers with high-performance implementation of classic and SOTA Generative Adversarial Networks, and s

6.4k Jan 09, 2023
GND-Nets (Graph Neural Diffusion Networks) in TensorFlow.

GNDC For submission to IEEE TKDE. Overview Here we provide the implementation of GND-Nets (Graph Neural Diffusion Networks) in TensorFlow. The reposit

Wei Ye 3 Aug 08, 2022
Perform zero-order Hankel Transform for an 1D array (float or real valued).

perform zero-order Hankel Transform for an 1D array (float or real valued). An discrete form of Parseval theorem is guaranteed. Suit for iterative problems.

1 Jan 17, 2022
🚀 PyTorch Implementation of "Progressive Distillation for Fast Sampling of Diffusion Models(v-diffusion)"

PyTorch Implementation of "Progressive Distillation for Fast Sampling of Diffusion Models(v-diffusion)" Unofficial PyTorch Implementation of Progressi

Vitaliy Hramchenko 58 Dec 19, 2022
A TensorFlow implementation of Neural Program Synthesis from Diverse Demonstration Videos

ViZDoom http://vizdoom.cs.put.edu.pl ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is pri

Hyeonwoo Noh 1 Aug 19, 2020
MakeItTalk: Speaker-Aware Talking-Head Animation

MakeItTalk: Speaker-Aware Talking-Head Animation This is the code repository implementing the paper: MakeItTalk: Speaker-Aware Talking-Head Animation

Adobe Research 285 Jan 08, 2023
Streamlit App For Product Analysis - Streamlit App For Product Analysis

Streamlit_App_For_Product_Analysis Здравствуйте! Перед вами дашборд, позволяющий

Grigory Sirotkin 1 Jan 10, 2022
Implementations of CNNs, RNNs, GANs, etc

Tensorflow Programs and Tutorials This repository will contain Tensorflow tutorials on a lot of the most popular deep learning concepts. It'll also co

Adit Deshpande 1k Dec 30, 2022
Dynamic Attentive Graph Learning for Image Restoration, ICCV2021 [PyTorch Code]

Dynamic Attentive Graph Learning for Image Restoration This repository is for GATIR introduced in the following paper: Chong Mou, Jian Zhang, Zhuoyuan

Jian Zhang 84 Dec 09, 2022