Learning What and Where to Draw

Last update: Nov 18, 2022

Related tags

Deep Learning nips2016

Overview

###Learning What and Where to Draw Scott Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, Honglak Lee

This is the code for our NIPS 2016 paper on text- and location-controllable image synthesis using conditional GANs. Much of the code is adapted from reedscot/icml2016 and dcgan.torch.

####Setup Instructions

You will need to install Torch, CuDNN, stnbhwd and the display package.

####How to train a text to image model:

Download the data including captions, location annotations and pretrained models.
Download the birds and humans image data.
Modify the CONFIG file to point to your data.
Run one of the training scripts, e.g. ./scripts/train_cub_keypoints.sh

####How to generate samples:

./scripts/run_all_demos.sh.
html files will be generated with results like the following:

Moving the bird's position via bounding box:

Moving the bird's position via keypoints:

Birds text to image with ground-truth keypoints:

Birds text to image with generated keypoints:

Humans text to image with ground-truth keypoints:

Humans text to image with generated keypoints:

####Citation

If you find this useful, please cite our work as follows:

@inproceedings{reed2016learning,
  title={Learning What and Where to Draw},
  author={Scott Reed and Zeynep Akata and Santosh Mohan and Samuel Tenka and Bernt Schiele and Honglak Lee},
  booktitle={Advances in Neural Information Processing Systems},
  year={2016}
}

Learning What and Where to Draw

Related tags

Overview

Owner

Scott Ellison Reed

Repository for MeshTalk supplemental material and code once the (already approved) 16 GHS captures our lab will make publicly available are released.

HandFoldingNet ✌️ : A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton

A library for efficient similarity search and clustering of dense vectors.

Material del curso IIC2233 Programación Avanzada 📚

Visualizing lattice vibration information from phonon dispersion to atoms (For GPUMD)

Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods

[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Implementations of CNNs, RNNs, GANs, etc

[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

A repository for storing njxzc final exam review material

Deep ViT Features as Dense Visual Descriptors

A Python implementation of the Locality Preserving Matching (LPM) method for pruning outliers in image matching.

This is a collection of our NAS and Vision Transformer work.

Image Captioning using CNN and Transformers

GLIP: Grounded Language-Image Pre-training

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

Count GitHub Stars ⭐

Implementation for "Conditional entropy minimization principle for learning domain invariant representation features"

Solving Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)