Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

Last update: Sep 11, 2022

Overview

VQGAN-CLIP-Docker

About

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

This is a stripped and minimal dependency repository for running locally or in production VQGAN+CLIP.

For a Google Colab notebook see the original repository.

Samples

Setup

Clone this repository and cd inside.

git clone https://github.com/kcosta42/VQGAN-CLIP-Docker.git
cd VQGAN-CLIP-Docker

Download a VQGAN model and put it in the ./models folder.

Dataset	Link
ImageNet (f=16), 16384	vqgan_imagenet_f16_16384

For GPU capability, make sure you have CUDA installed on your system (tested with CUDA 11.1+).

6 GB of VRAM is required to generate 256x256 images.
11 GB of VRAM is required to generate 512x512 images.
24 GB of VRAM is required to generate 1024x1024 images. (Untested)

Local

Install the Python requirements

python3 -m pip install -r requirements.txt

To know if you can run this on your GPU, the following command must return True.

python3 -c "import torch; print(torch.cuda.is_available());"

Docker

Make sure you have docker and docker-compose installed. nvidia-docker is needed if you want to run this on your GPU through Docker.

A Makefile is provided for ease of use.

make build  # Build the docker image

Usage

Two configuration file are provided ./configs/local.json and ./configs/docker.json. They are ready to go, but you may want to edit them to meet your need. Check the Configuration section to understand each field.

The resulting generations can be found in the ./outputs folder.

GPU

To run locally:

python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate

CPU

To run locally:

DEVICE=cpu python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate-cpu

Configuration

Argument	Type	Descriptions
`prompts`	List[str]	Text prompts
`image_prompts`	List[FilePath]	Image prompts / target image path
`max_iterations`	int	Number of iterations
`save_freq`	int	Save image iterations
`size`	[int, int]	Image size (width height)
`init_image`	FilePath	Initial image
`init_noise`	str	Initial noise image ['gradient','pixels']
`init_weight`	float	Initial weight
`output_dir`	FilePath	Path to output directory
`models_dir`	FilePath	Path to models cache directory
`clip_model`	FilePath	CLIP model path or name
`vqgan_checkpoint`	FilePath	VQGAN checkpoint path
`vqgan_config`	FilePath	VQGAN config path
`noise_prompt_seeds`	List[int]	Noise prompt seeds
`noise_prompt_weights`	List[float]	Noise prompt weights
`step_size`	float	Learning rate
`cutn`	int	Number of cuts
`cut_pow`	float	Cut power
`seed`	int	Seed (-1 for random seed)
`optimizer`	str	Optimiser ['Adam','AdamW','Adagrad','Adamax','DiffGrad','AdamP','RAdam']
`augments`	List[str]	Enabled augments ['Ji','Sh','Gn','Pe','Ro','Af','Et','Ts','Cr','Er','Re']

Acknowledgments

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}

@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis},
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{ramesh2021zeroshot,
    title   = {Zero-Shot Text-to-Image Generation},
    author  = {Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
    year    = {2021},
    eprint  = {2102.12092},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

Related tags

Overview

VQGAN-CLIP-Docker

About

Samples

Setup

Local

Docker

Usage

GPU

CPU

Configuration

Acknowledgments

Citations

Owner

Kevin Costa

Simply enable or disable your Nvidia dGPU

Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)

The code for our paper Semi-Supervised Learning with Multi-Head Co-Training

Fast and Easy Infinite Neural Networks in Python

Scene-Text-Detection-and-Recognition (Pytorch)

ktrain is a Python library that makes deep learning and AI more accessible and easier to apply

Visualizing lattice vibration information from phonon dispersion to atoms (For GPUMD)

Blender Add-on that sets a Material's Base Color to one of Pantone's Colors of the Year

Demo for Real-time RGBD-based Extended Body Pose Estimation paper

Official Implementation of Neural Splines

Implementation of a Transformer, but completely in Triton

(NeurIPS '21 Spotlight) IQ-Learn: Inverse Q-Learning for Imitation

A Real-World Benchmark for Reinforcement Learning based Recommender System

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection, AAAI 2021.

[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination

A baseline code for VSPW

Multi-View Radar Semantic Segmentation

Code related to the manuscript "Averting A Crisis In Simulation-Based Inference"

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

unet for image segmentation