Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

Last update: Dec 16, 2022

Related tags

Deep Learning StackGAN-v2

Overview

StackGAN-v2

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks by Han Zhang*, Tao Xu*, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas.

Dependencies

python 2.7

Pytorch

In addition, please add the project folder to PYTHONPATH and pip install the following packages:

tensorboard
python-dateutil
easydict
pandas
torchfile

Data

Download our preprocessed char-CNN-RNN text embeddings for birds and save them to data/

[Optional] Follow the instructions reedscot/icml2016 to download the pretrained char-CNN-RNN text encoders and extract text embeddings.

Download the birds image data. Extract them to data/birds/
Download ImageNet dataset and extract the images to data/imagenet/
Download LSUN dataset and save the images to data/lsun

Training

Train a StackGAN-v2 model on the bird (CUB) dataset using our preprocessed embeddings:
- python main.py --cfg cfg/birds_3stages.yml --gpu 0
Train a StackGAN-v2 model on the ImageNet dog subset:
- python main.py --cfg cfg/dog_3stages_color.yml --gpu 0
Train a StackGAN-v2 model on the ImageNet cat subset:
- python main.py --cfg cfg/cat_3stages_color.yml --gpu 0
Train a StackGAN-v2 model on the lsun bedroom subset:
- python main.py --cfg cfg/bedroom_3stages_color.yml --gpu 0
Train a StackGAN-v2 model on the lsun church subset:
- python main.py --cfg cfg/church_3stages_color.yml --gpu 0
*.yml files are example configuration files for training/evaluation our models.
If you want to try your own datasets, here are some good tips about how to train GAN. Also, we encourage to try different hyper-parameters and architectures, especially for more complex datasets.

Pretrained Model

StackGAN-v2 for bird. Download and save it to models/ (The inception score for this Model is 4.04±0.05)
StackGAN-v2 for dog. Download and save it to models/ (The inception score for this Model is 9.55±0.11)
StackGAN-v2 for cat. Download and save it to models/
StackGAN-v2 for bedroom. Download and save it to models/
StackGAN-v2 for church. Download and save it to models/

Evaluating

Run python main.py --cfg cfg/eval_birds.yml --gpu 1 to generate samples from captions in birds validation set.
Change the eval_*.yml files to generate images from other pre-trained models.

Examples generated by StackGAN-v2

Tsne visualization of randomly generated birds, dogs, cats, churchs and bedrooms

Citing StackGAN++

If you find StackGAN useful in your research, please consider citing:

@article{Han17stackgan2,
  author    = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
  title     = {StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks},
  journal   = {arXiv: 1710.10916},
  year      = {2017},
}

@inproceedings{han2017stackgan,
Author = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
Title = {StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks},
Year = {2017},
booktitle = {{ICCV}},
}

Our follow-up work

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks [Supplementary][code]

References

Generative Adversarial Text-to-Image Synthesis Paper Code
Learning Deep Representations of Fine-grained Visual Descriptions Paper Code

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

Related tags

Overview

StackGAN-v2

Dependencies

Citing StackGAN++

Owner

Han Zhang

EXplainable Artificial Intelligence (XAI)

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.

MEND: Model Editing Networks using Gradient Decomposition

BLEURT is a metric for Natural Language Generation based on transfer learning.

The 2nd Version Of Slothybot

Pytorch implementation for the Temporal and Object Quantification Networks (TOQ-Nets).

Fast and Context-Aware Framework for Space-Time Video Super-Resolution (VCIP 2021)

Apply a perspective transformation to a raster image inside Inkscape (no need to use an external software such as GIMP or Krita).

RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.

Pytorch implementation for "Implicit Semantic Response Alignment for Partial Domain Adaptation"

Implementation of Nyström Self-attention, from the paper Nyströmformer

Decision Transformer: A brand new Offline RL Pattern

ElasticFace: Elastic Margin Loss for Deep Face Recognition

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow.

Tensorflow implementation of Swin Transformer model.

End-To-End Crowdsourcing

Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

PyArmadillo: an alternative approach to linear algebra in Python

GANfolk: Using AI to create portraits of fictional people to sell as NFTs

Pytorch implementation for reproducing StackGAN_v2 results in the paper StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

Related tags

Overview

StackGAN-v2

Dependencies

Citing StackGAN++

Owner

Han Zhang

EXplainable Artificial Intelligence (XAI)

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

MEND: Model Editing Networks using Gradient Decomposition

BLEURT is a metric for Natural Language Generation based on transfer learning.

The 2nd Version Of Slothybot

Pytorch implementation for the Temporal and Object Quantification Networks (TOQ-Nets).

Fast and Context-Aware Framework for Space-Time Video Super-Resolution (VCIP 2021)

Apply a perspective transformation to a raster image inside Inkscape (no need to use an external software such as GIMP or Krita).

RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.

Pytorch implementation for "Implicit Semantic Response Alignment for Partial Domain Adaptation"

Implementation of Nyström Self-attention, from the paper Nyströmformer

Decision Transformer: A brand new Offline RL Pattern

ElasticFace: Elastic Margin Loss for Deep Face Recognition

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow.

Tensorflow implementation of Swin Transformer model.

End-To-End Crowdsourcing

Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

PyArmadillo: an alternative approach to linear algebra in Python

GANfolk: Using AI to create portraits of fictional people to sell as NFTs

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.