Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Last update: Dec 28, 2022

Overview

Make-A-Scene - PyTorch

Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/pdf/2203.13131.pdf)

Figure 1. from paper

Note: this is work in progress.

Everyone is happily invited to contribute --> Discord Channel: https://discord.gg/hCRMGRZkC6

We would love to open-source a trained model. The model is a billion parameter model. Training it requires a lot of compute. If anyone can provide computational resources, let us know.

Paper Description:

Make-A-Scene modifies the VQGAN framework. It makes heavy use of using semantic segmentation maps for extra conditioning. This enables more influence on the generation process. Morever, it also conditions on text. The main improvements are the following:

Segmentation condition: separate VQVAE is trained (VQ-SEG) + loss modified to a weighted binary cross entropy. (3.4)
VQGAN training (VQ-IMG) is extended by Face-Loss & Object-Loss (3.3 & 3.5)
Classifier Guidance for the autoregressive transformer (3.7)

Training Pipeline

Figure 6. from paper

What needs to be done?

Refer to the different folders to see details.

Citation

@misc{https://doi.org/10.48550/arxiv.2203.13131,
  doi = {10.48550/ARXIV.2203.13131},
  url = {https://arxiv.org/abs/2203.13131},
  author = {Gafni, Oran and Polyak, Adam and Ashual, Oron and Sheynin, Shelly and Parikh, Devi and Taigman, Yaniv},
  title = {Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Related tags

Overview

Make-A-Scene - PyTorch

Note: this is work in progress.

Paper Description:

Training Pipeline

What needs to be done?

Citation

Owner

Casual GAN Papers

[ICCV 2021 Oral] SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

Paddle implementation for "Highly Efficient Knowledge Graph Embedding Learning with Closed-Form Orthogonal Procrustes Analysis" (NAACL 2021)

Official implementation for “Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior”

This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

OpenLT: An open-source project for long-tail classification

A library for optimization on Riemannian manifolds

[CVPR 2021] MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition

Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus

Implementation of CaiT models in TensorFlow and ImageNet-1k checkpoints. Includes code for inference and fine-tuning.

Paper list of log-based anomaly detection

Label-Free Model Evaluation with Semi-Structured Dataset Representations

Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch .

A torch.Tensor-like DataFrame library supporting multiple execution runtimes and Arrow as a common memory format

A general 3D Object Detection codebase in PyTorch.

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals, CVPR2021

ExCon: Explanation-driven Supervised Contrastive Learning

On-device speech-to-intent engine powered by deep learning

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

Implementation of PersonaGPT Dialog Model