Text-to-Image Translation (DALL-E) for TPU in Pytorch

Refactoring Taming Transformers and DALLE-pytorch for TPU VM with Pytorch Lightning

Requirements

pip install -r requirements.txt

Data Preparation

Place any image dataset with ImageNet-style directory structure (at least 1 subfolder) to fit the dataset into pytorch ImageFolder.

Training VQVAEs

You can easily test main.py with randomly generated fake data.

python train_vae.py --use_tpus --fake_data

For actual training provide specific directory for train_dir, val_dir, log_dir:

python train_vae.py --use_tpus --train_dir [training_set] --val_dir [val_set] --log_dir [where to save results]

Training DALL-E

python train_dalle.py --use_tpus --train_dir [training_set] --val_dir [val_set] --log_dir [where to save results] --vae_path [pretrained vae] --bpe_path [pretrained bpe(optional)]

TODO

Refactor Encoder and Decoder modules for better readability
Refactor VQVAE2
Add Net2Net Conditional Transformer for conditional image generation
Refactor, optimize, and merge DALL-E with Net2Net Conditional Transformer
Add Guided Diffusion + CLIP for image refinement
Add VAE converter for JAX to support dalle-mini
Add DALL-E colab notebook
Add RBGumbelQuantizer
Add HiT

ON-GOING

Test large dataset loading on TPU Pods
Change current DALL-E code to fully support latest updates from DALLE-pytorch

DONE

BibTeX

@misc{oord2018neural,
      title={Neural Discrete Representation Learning}, 
      author={Aaron van den Oord and Oriol Vinyals and Koray Kavukcuoglu},
      year={2018},
      eprint={1711.00937},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@misc{razavi2019generating,
      title={Generating Diverse High-Fidelity Images with VQ-VAE-2}, 
      author={Ali Razavi and Aaron van den Oord and Oriol Vinyals},
      year={2019},
      eprint={1906.00446},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis}, 
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{ramesh2021zeroshot,
    title   = {Zero-Shot Text-to-Image Generation}, 
    author  = {Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
    year    = {2021},
    eprint  = {2102.12092},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Refactoring dalle-pytorch and taming-transformers for TPU VM

Related tags

Overview

Text-to-Image Translation (DALL-E) for TPU in Pytorch

Requirements

Data Preparation

Training VQVAEs

Training DALL-E

TODO

ON-GOING

DONE

BibTeX

Owner

Kim, Taehoon

The project page of paper: Architecture disentanglement for deep neural networks [ICCV 2021, oral]

sequitur is a library that lets you create and train an autoencoder for sequential data in just two lines of code

Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

A Pose Estimator for Dense Reconstruction with the Structured Light Illumination Sensor

Degree-Quant: Quantization-Aware Training for Graph Neural Networks.

Official code for Score-Based Generative Modeling through Stochastic Differential Equations

This repo is a C++ version of yolov5_deepsort_tensorrt. Packing all C++ programs into .so files, using Python script to call C++ programs further.

Official pytorch implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion"

simple_pytorch_example project is a toy example of a python script that instantiates and trains a PyTorch neural network on the FashionMNIST dataset

🎓Automatically Update CV Papers Daily using Github Actions (Update at 12:00 UTC Every Day)

Code for the paper "Query Embedding on Hyper-relational Knowledge Graphs"

Focal Loss for Dense Rotation Object Detection

Avatarify Python - Avatars for Zoom, Skype and other video-conferencing apps.

Implements an infinite sum of poisson-weighted convolutions

BED: A Real-Time Object Detection System for Edge Devices

Code for the paper: "On the Bottleneck of Graph Neural Networks and Its Practical Implications"

Aiming at the common training datsets split, spectrum preprocessing, wavelength select and calibration models algorithm involved in the spectral analysis process

Non-Attentive-Tacotron - This is Pytorch Implementation of Google's Non-attentive Tacotron.

[ICCV '21] In this repository you find the code to our paper Keypoint Communities

OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis