Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020

Overview

Learning Canonical Representations for Scene Graph to Image Generation (ECCV 2020)

Roei Herzig*, Amir Bar*, Huijuan Xu, Gal Chechik, Trevor Darrell, Amir Globerson

Main project page.

Generation of scenes with many objects. Our method achieves better performance on such scenes than previous methods. Left: A partial input scene graph. Middle: Generation using [1]. Right: Generation using our proposed method.

Our novel contributions are:

  1. We propose a model that uses canonical representations of SGs, thus obtaining stronger invariance properties. This in turn leads to generalization on semantically equivalent graphs and improved robustness to graph size and noise in comparison to existing methods.
  2. We show how to learn the canonicalization process from data.
  3. We use our canonical representations within an SG-to-image model and demonstrate our approach results in an improved generation on Visual Genome, COCO, and CLEVR, compared to the state-of-the-art baselines.

Dependencies

To get started with the framework, install the following dependencies:

Data

Follow the commands below to build the data.

COCO

./scripts/download_coco.sh

VG

./scripts/download_vg.sh

CLEVR

Please download the CLEVR-Dialog Dataset from here.

Training

Training a SG-to-Layout model:

python -m scripts.train --dataset={packed_coco, packed_vg, packed_clevr}  

Training AttSpade - Layout-to-Image model:

Optional arguments:

--output_dir=output_path_dir/%s (s is the run_name param) --run_name=folder_name --checkpoint_every=N (default=5000) --dataroot=datasets_path --debug (a flag for debug)

Train on COCO (with boxes):

python -m scripts.train --dataset=coco --batch_size=16 --loader_num_workers=0 --skip_graph_model=0 --skip_generation=0 --image_size=256,256 --min_objects=1 --max_objects=1000 --gpu_ids=0 --use_cuda

Train on VG:

python -m scripts.train --dataset=vg --batch_size=16 --loader_num_workers=0 --skip_graph_model=0 --skip_generation=0 --image_size=256,256 --min_objects=3 --max_objects=30 --gpu_ids=0 --use_cuda

Train on CLEVR:

python -m scripts.train --dataset=packed_clevr --batch_size=6 --loader_num_workers=0 --skip_graph_model=0 --skip_generation=0 --image_size=256,256 --use_img_disc=1 --gpu_ids=0 --use_cuda

Inference

Inference SG-to-Layout

To produce layout outputs and IOU results, run:

python -m scripts.layout_generation --checkpoint=<trained_model_folder> --gpu_ids=<0/1/2>

A new folder with the results will be created in: <trained_model_folder>

Pre-trained Models:

Packed COCO: link

Packed Visual Genome: link

Inference Layout-to-Image (LostGANs)

Please use LostGANs implementation

Inference Layout-to-Image (from dataframe)

To produce the image from a dataframe, run:

python -m scripts.generation_dataframe --checkpoint=<trained_model_folder>

A new folder with the results will be created in: <trained_model_folder>

Inference Layout-to-Image (AttSPADE)

COCO/ Visual Genome

  1. Generate images from a layout (dataframe):
python -m scripts.generation_dataframe --gpu_ids=<0/1/2> --checkpoint=<model_path> --output_dir=<output_path> --data_frame=<dataframe_path> --mode=<gt/pred>

mode=gt defines use gt_boxes while mode=pred use predicted box by our WSGC model from the paper (see the dataframe for more details).

Pre-trained Models:
COCO

dataframe: link; 128x128 resolution: link; 256x256 resolution: link

Visual Genome

dataframe: link; 128x128 resolution: link; 256x256 resolution: link

  1. Generate images from a scene graph:
python -m scripts.generation_attspade --gpu_ids=<0/1/2> --checkpoint=<model/path> --output_dir=<output_path>

CLEVR

This script generates CLEVR images on large scene graphs from scene_graphs.pkl. It generates the CLEVR results for both WSGC + AttSPADE and Sg2Im + AttSPADE. For more information, please refer to the paper.

python -m scripts.generate_clevr --gpu_ids=<0/1/2> --layout_not_learned_checkpoint=<model_path> --layout_learned_checkpoint=<model_path> --output_dir=<output_path>
Pre-trained Models:

Baseline (Sg2Im): link; WSGC: link

Acknowledgment

References

[1] Justin Johnson, Agrim Gupta, Li Fei-Fei, Image Generation from Scene Graphs, 2018.

Citation

@inproceedings{herzig2019canonical,
 author    = {Herzig, Roei and Bar, Amir and Xu, Huijuan and Chechik, Gal and Darrell, Trevor and Globerson, Amir},
 title     = {Learning Canonical Representations for Scene Graph to Image Generation},
 booktitle = {Proc. of the European Conf. on Computer Vision (ECCV)},
 year      = {2020}
}
Owner
roei_herzig
CS PhD student at Tel Aviv University. Algorithm Researcher, R&D at Nexar & Trax. Studied MSc (CS), BSc (CS) and BSc (Physics) at TAU.
roei_herzig
[ICCV21] Self-Calibrating Neural Radiance Fields

Self-Calibrating Neural Radiance Fields, ICCV, 2021 Project Page | Paper | Video Author Information Yoonwoo Jeong [Google Scholar] Seokjun Ahn [Google

381 Dec 30, 2022
Betafold - AlphaFold with tunings

BetaFold We (hegelab.org) craeted this standalone AlphaFold (AlphaFold-Multimer,

2 Aug 11, 2022
Here is the diagnostic tool for BMVC 2021 paper Diagnosing Errors in Video Relation Detectors.

Here is the diagnostic tool for BMVC 2021 paper Diagnosing Errors in Video Relation Detectors. We provide a tiny ground truth file demo_gt.json, and t

Shuo Chen 3 Dec 26, 2022
Stock-Prediction - prediction of stock market movements using sentiment analysis and deep learning.

Stock-Prediction- In this project, we aim to enhance the prediction of stock market movements using sentiment analysis and deep learning. We divide th

5 Jan 25, 2022
A Python package to process & model ChEMBL data.

insilico: A Python package to process & model ChEMBL data. ChEMBL is a manually curated chemical database of bioactive molecules with drug-like proper

Steven Newton 0 Dec 09, 2021
converts nominal survey data into a numerical value based on a dictionary lookup.

SWAP RATE Converts nominal survey data into a numerical values based on a dictionary lookup. It allows the user to switch nominal scale data from text

Jake Rhodes 1 Jan 18, 2022
realsense d400 -> jpg + csv

Realsense-capture realsense d400 - jpg + csv Requirements RealSense sdk : Installation Python3 pyrealsense2 (RealSense SDK) Numpy OpenCV Tkinter Run

Ar-Ray 2 Mar 22, 2022
PyTorch implementation of neural style randomization for data augmentation

README Augment training images for deep neural networks by randomizing their visual style, as described in our paper: https://arxiv.org/abs/1809.05375

84 Nov 23, 2022
Package for working with hypernetworks in PyTorch.

Package for working with hypernetworks in PyTorch.

Christian Henning 71 Jan 05, 2023
LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations

LIMEcraft LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations The LIMEcraft algorithm is an explanatory method based on

MI^2 DataLab 4 Aug 01, 2022
Style-based Point Generator with Adversarial Rendering for Point Cloud Completion (CVPR 2021)

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion (CVPR 2021) An efficient PyTorch library for Point Cloud Completion.

Microsoft 119 Jan 02, 2023
Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

ARAE Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun https://arxiv.org/abs/1706.04223 Disc

Junbo (Jake) Zhao 399 Jan 02, 2023
WRENCH: Weak supeRvision bENCHmark

🔧 What is it? Wrench is a benchmark platform containing diverse weak supervision tasks. It also provides a common and easy framework for development

Jieyu Zhang 176 Dec 28, 2022
PyTorch code for Composing Partial Differential Equations with Physics-Aware Neural Networks

FInite volume Neural Network (FINN) This repository contains the PyTorch code for models, training, and testing, and Python code for data generation t

Cognitive Modeling 20 Dec 18, 2022
Python package for missing-data imputation with deep learning

MIDASpy Overview MIDASpy is a Python package for multiply imputing missing data using deep learning methods. The MIDASpy algorithm offers significant

MIDASverse 77 Dec 03, 2022
Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness

Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness Code for Paper "Imbalanced Gradients: A Subtle Cause of Overestimated Adv

Hanxun Huang 11 Nov 30, 2022
Software & Hardware to do multi color printing with Sharpies

3D Print Colorizer is a combination of 3D printed parts and a Cura plugin which allows anyone with an Ender 3 like 3D printer to produce multi colored

343 Jan 06, 2023
A highly modular PyTorch framework with a focus on Neural Architecture Search (NAS).

UniNAS A highly modular PyTorch framework with a focus on Neural Architecture Search (NAS). under development (which happens mostly on our internal Gi

Cognitive Systems Research Group 19 Nov 23, 2022
A PaddlePaddle implementation of STGCN with a few modifications in the model architecture in order to forecast traffic jam.

About This repository contains the code of a PaddlePaddle implementation of STGCN based on the paper Spatio-Temporal Graph Convolutional Networks: A D

Tianjian Li 1 Jan 11, 2022
Sound Source Localization for AI Grand Challenge 2021

Sound-Source-Localization Sound Source Localization study for AI Grand Challenge 2021 (sponsored by NC Soft Vision Lab) Preparation 1. Place the data-

sanghoon 19 Mar 29, 2022