NP DRAW paper released code

Related tags

Deep LearningNPDRAW
Overview

NP-DRAW: A Non-Parametric Structured Latent Variable Model for Image Generation

This repo contains the official implementation for the NP-DRAW paper.

by Xiaohui Zeng, Raquel Urtasun, Richard Zemel, Sanja Fidler, and Renjie Liao

Abstract

In this paper, we present a non-parametric structured latent variable model for image generation, called NP-DRAW, which sequentially draws on a latent canvas in a part-by-part fashion and then decodes the image from the canvas. Our key contributions are as follows.

  1. We propose a non-parametric prior distribution over the appearance of image parts so that the latent variable “what-to-draw” per step becomes a categorical random variable. This improves the expressiveness and greatly eases the learning compared to Gaussians used in the literature.
  2. We model the sequential dependency structure of parts via a Transformer, which is more powerful and easier to train compared to RNNs used in the literature.
  3. We propose an effective heuristic parsing algorithm to pre-train the prior. Experiments on MNIST, Omniglot, CIFAR-10, and CelebA show that our method significantly outperforms previous structured image models like DRAW and AIR and is competitive to other generic generative models.

Moreover, we show that our model’s inherent compositionality and interpretability bring significant benefits in the low-data learning regime and latent space editing.

Generation Process

prior

Our prior generate "whether", "where" and "what" to draw per step. If the "whether-to-draw" is true, a patch from the part bank is selected and pasted to the canvas. The final canvas is refined by our decoder.

More visualization of the canvas and images

twitter-1page

Latent Space Editting

We demonstrate the advantage of our interpretable latent space via interactively editing/composing the latent canvas.

edit

  • Given images A and B, we encode them to obtain the latent canvases. Then we compose a new canvas by placing certain semantically meaningful parts (e.g., eyeglasses, hair, beard, face) from canvas B on top of canvas A. Finally, we decode an image using the composed canvas.

Dependencies

# the following command will install torch 1.6.0 and other required packages 
conda env create -f environment.yml # edit the last link in the yml file for the directory
conda activate npdraw 

Pretrained Model

Pretrained model will be available here To use the pretrained models, download the zip file under exp folder and unzip it. For expample, with the cifar.zip file we will get ./exp/cifarcm/cat_vloc_at/ and ./exp/cnn_prior/cifar/.

Testing the pretrained NPDRAW model:

  • before running the evaluation, please also download the stats on the test set from google-drive, and run
mkdir datasets 
mv images.tar.gz datasets 
cd datasets 
tar xzf images.tar.gz 

The following commands test the FID score of the NPDRAW model.

# for mnist
bash scripts/local_sample.sh exp/stoch_mnist/cat_vloc_at/0208/p5s5n36vitBinkl1r1E3_K50w5sc0_gs_difflr_b500/E00550.pth # FID 2.55

# for omniglot
bash scripts/local_sample.sh exp/omni/cat_vloc_at/0208/p5s5n36vitBinkl1r1E3_K50w5sc0_gs_difflr_b500/ckpt_epo799.pth # FID 5.53

# for cifar
bash scripts/local_sample.sh exp/cifarcm/cat_vloc_at/0208/p4s4n64_vitcnnLkl11E3_K200w4sc0_gs_difflr_b150/ckpt_epo499.pth #

# for celeba
bash scripts/local_sample.sh exp/celebac32/cat_vloc_at/0208/p4s4n64_vitcnnLkl0e531E3_K200w4sc0_gs_difflr_b150/ckpt_epo199.pth # FID 41.29

Training

Use ./scripts/train_$DATASET.sh to train the model.


  • The code in tool/pytorch-fid/ is adapated from here
  • The transformer code is adapted from here
Owner
ZENG Xiaohui
ZENG Xiaohui
PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

Image Super-Resolution with Non-Local Sparse Attention This repository is for NLSN introduced in the following paper "Image Super-Resolution with Non-

143 Dec 28, 2022
Official implementation of the NeurIPS'21 paper 'Conditional Generation Using Polynomial Expansions'.

Conditional Generation Using Polynomial Expansions Official implementation of the conditional image generation experiments as described on the NeurIPS

Grigoris 4 Aug 07, 2022
Deep learning based hand gesture recognition using LSTM and MediaPipie.

Hand Gesture Recognition Deep learning based hand gesture recognition using LSTM and MediaPipie. Demo video using PingPong Robot Files Pretrained mode

Brad 24 Nov 11, 2022
A minimalist environment for decision-making in autonomous driving

highway-env A collection of environments for autonomous driving and tactical decision-making tasks An episode of one of the environments available in

Edouard Leurent 1.6k Jan 07, 2023
COVID-VIT: Classification of Covid-19 from CT chest images based on vision transformer models

COVID-ViT COVID-VIT: Classification of Covid-19 from CT chest images based on vision transformer models This code is to response to te MIA-COV19 compe

17 Dec 30, 2022
NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

NU-Wave — Official PyTorch Implementation NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling Junhyeok Lee, Seungu Han @ MINDsLab Inc

MINDs Lab 242 Dec 23, 2022
Face recognize system

FRS Face_recognize_system This project contains my work that target on solving some problems of FRS: Face detection: Retinaface Face anti-spoofing: Fo

Tran Anh Tuan 4 Nov 18, 2021
Like Dirt-Samples, but cleaned up

Clean-Samples Like Dirt-Samples, but cleaned up, with clear provenance and license info (generally a permissive creative commons licence but check the

TidalCycles 39 Nov 30, 2022
Normalizing Flows with a resampled base distribution

Resampling Base Distributions of Normalizing Flows Normalizing flows are a popular class of models for approximating probability distributions. Howeve

Vincent Stimper 24 Nov 03, 2022
Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

HamasKhan 3 Jul 08, 2022
v objective diffusion inference code for PyTorch.

v-diffusion-pytorch v objective diffusion inference code for PyTorch, by Katherine Crowson (@RiversHaveWings) and Chainbreakers AI (@jd_pressman). The

Katherine Crowson 635 Dec 30, 2022
DyNet: The Dynamic Neural Network Toolkit

The Dynamic Neural Network Toolkit General Installation C++ Python Getting Started Citing Releases and Contributing General DyNet is a neural network

Chris Dyer's lab @ LTI/CMU 3.3k Jan 06, 2023
Scalable Multi-Agent Reinforcement Learning

Scalable Multi-Agent Reinforcement Learning 1. Featured algorithms: Value Function Factorization with Variable Agent Sub-Teams (VAST) [1] 2. Implement

3 Aug 02, 2022
Official implementations of PSENet, PAN and PAN++.

News (2021/11/03) Paddle implementation of PAN, see Paddle-PANet. Thanks @simplify23. (2021/04/08) PSENet and PAN are included in MMOCR. Introduction

395 Dec 14, 2022
Prototypical Networks for Few shot Learning in PyTorch

Prototypical Networks for Few shot Learning in PyTorch Simple alternative Implementation of Prototypical Networks for Few Shot Learning (paper, code)

Orobix 835 Jan 08, 2023
CVPR 2022 "Online Convolutional Re-parameterization"

OREPA: Online Convolutional Re-parameterization This repo is the PyTorch implementation of our paper to appear in CVPR2022 on "Online Convolutional Re

Mu Hu 121 Dec 21, 2022
Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

41 Jan 06, 2023
A fast Protein Chain / Ligand Extractor and organizer.

Are you tired of using visualization software, or full blown suites just to separate protein chains / ligands ? Are you tired of organizing the mess o

Amine Abdz 9 Nov 06, 2022
Real time sign language recognition

The proposed work aims at converting american sign language gestures into English that can be understood by everyone in real time.

Mohit Kaushik 6 Jun 13, 2022
YOLOv5 in PyTorch > ONNX > CoreML > TFLite

This repository represents Ultralytics open-source research into future object detection methods, and incorporates lessons learned and best practices evolved over thousands of hours of training and e

Ultralytics 34.1k Dec 31, 2022