v objective diffusion inference code for PyTorch.

Overview

v-diffusion-pytorch

v objective diffusion inference code for PyTorch, by Katherine Crowson (@RiversHaveWings) and Chainbreakers AI (@jd_pressman).

The models are denoising diffusion probabilistic models (https://arxiv.org/abs/2006.11239), which are trained to reverse a gradual noising process, allowing the models to generate samples from the learned data distributions starting from random noise. DDIM-style deterministic sampling (https://arxiv.org/abs/2010.02502) is also supported. The models are also trained on continuous timesteps. They use the 'v' objective from Progressive Distillation for Fast Sampling of Diffusion Models (https://openreview.net/forum?id=TIdIXIpzhoI).

Thank you to stability.ai for compute to train these models!

Dependencies

Model checkpoints:

  • CC12M 256x256, SHA-256 63946d1f6a1cb54b823df818c305d90a9c26611e594b5f208795864d5efe0d1f

A 602M parameter CLIP conditioned model trained on Conceptual 12M for 3.1M steps.

Sampling

Example

If the model checkpoints are stored in checkpoints/, the following will generate an image:

./clip_sample.py "the rise of consciousness" --model cc12m_1 --seed 0

If they are somewhere else, you need to specify the path to the checkpoint with --checkpoint.

CLIP conditioned/guided sampling

usage: clip_sample.py [-h] [--images [IMAGE ...]] [--batch-size BATCH_SIZE]
                      [--checkpoint CHECKPOINT] [--clip-guidance-scale CLIP_GUIDANCE_SCALE]
                      [--device DEVICE] [--eta ETA] [--model {cc12m_1}] [-n N] [--seed SEED]
                      [--steps STEPS]
                      [prompts ...]

prompts: the text prompts to use. Relative weights for text prompts can be specified by putting the weight after a colon, for example: "the rise of consciousness:0.5".

--batch-size: sample this many images at a time (default 1)

--checkpoint: manually specify the model checkpoint file

--clip-guidance-scale: how strongly the result should match the text prompt (default 500). If set to 0, the cc12m_1 model will still be CLIP conditioned and sampling will go faster and use less memory.

--device: the PyTorch device name to use (default autodetects)

--eta: set to 0 for deterministic (DDIM) sampling, 1 (the default) for stochastic (DDPM) sampling, and in between to interpolate between the two. DDIM is preferred for low numbers of timesteps.

--images: the image prompts to use (local files or HTTP(S) URLs). Relative weights for image prompts can be specified by putting the weight after a colon, for example: "image_1.png:0.5".

--model: specify the model to use (default cc12m_1)

-n: sample until this many images are sampled (default 1)

--seed: specify the random seed (default 0)

--steps: specify the number of diffusion timesteps (default is 1000, can lower for faster but lower quality sampling)

Comments
  • Generated images are completely black?! 😵 What am I doing wrong?

    Generated images are completely black?! 😵 What am I doing wrong?

    Hello, I am on Windows 10, and my gpu is a PNY Nvidia GTX 1660 TI 6 Gb. I installed V-Diffusion like so:

    • conda create --name v-diffusion python=3.8
    • conda activate v-diffusion
    • conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch (as per Pytorch website instructions)
    • pip install requests tqdm

    The problem is that when I launch the cfg_sample.py or clip_sample.py command lines, the generated images are completely black, although the inference process seems to run nicely and without errors.

    Things I've tried:

    • installing previous pytorch version with conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
    • removing V-Diffusion conda environment completely and recreating it anew
    • uninstalling nvidia drivers and performing a new clean driver install (I tried both Nvidia Studio drivers and Nvidia Game Ready drivers)
    • uninstalling and reinstalling Conda completely

    But nothing helped... and at this point I don't know what else to try...

    The only interesting piece of information I could gather is that for some reason this problem also happens with another text-to-image project called Big Sleep where similar to V-Diffusion the inference process appears to run correctly but the generated images are all black.

    I think there must be some simple detail I'm overlooking... which it's making me go insane... 😵 Please let me know something if you think you can help! THANKS !

    opened by illtellyoulater 10
  • what does this line mean in README?

    what does this line mean in README?

    A weight of 1 will sample images that match the prompt roughly as well as images usually match prompts like that in the training set.

    I can't wrap my head around this sentence. Could you please explain it with different wording? Thanks!

    opened by illtellyoulater 2
  • AttributeError: module 'torch' has no attribute 'special'

    AttributeError: module 'torch' has no attribute 'special'

    torch version: 1.8.1+cu111

    python ./cfg_sample.py "the rise of consciousness":5 -n 4 -bs 4 --seed 0 Using device: cuda:0 Traceback (most recent call last): File "./cfg_sample.py", line 154, in main() File "./cfg_sample.py", line 148, in main run_all(args.n, args.batch_size) File "./cfg_sample.py", line 136, in run_all steps = utils.get_spliced_ddpm_cosine_schedule(t) File "C:\Users\m\Desktop\v-diffusion-pytorch\diffusion\utils.py", line 75, in get_spliced_ddpm_cosine_schedule ddpm_part = get_ddpm_schedule(big_t + ddpm_crossover - cosine_crossover) File "C:\Users\m\Desktop\v-diffusion-pytorch\diffusion\utils.py", line 65, in get_ddpm_schedule log_snr = -torch.special.expm1(1e-4 + 10 * ddpm_t**2).log() AttributeError: module 'torch' has no attribute 'special'

    opened by tempdeltavalue 2
  • Add github action to automatically push to pypi on Release x.y.z commit

    Add github action to automatically push to pypi on Release x.y.z commit

    you need to create a token there https://pypi.org/manage/account/token/ and put it in there https://github.com/crowsonkb/v-diffusion-pytorch/settings/secrets/actions/new name it PYPI_PASSWORD

    The release will be triggered when you name your commit Release x.y.z I advise to change the version in setup.cfg in that commit

    opened by rom1504 0
  • [Question] What's the meaning of these equations in sample and cfg_model_fn(from sample.py )

    [Question] What's the meaning of these equations in sample and cfg_model_fn(from sample.py )

    Hello, thank you for your great work! I have a little puzzle in sample.py `# Get the model output (v, the predicted velocity) with torch.cuda.amp.autocast(): v = model(x, ts * steps[i], **extra_args).float()

        # Predict the noise and the denoised image
        pred = x * alphas[i] - v * sigmas[i]
        eps = x * sigmas[i] + v * alphas[i]`
    

    what the meaning ? Where it comes?

    opened by zhangquanwei962 0
  • Images don’t seem to evolve with each iteration

    Images don’t seem to evolve with each iteration

    Thanks for sharing such an amazing repo!

    I am testing a prompt like openAI “an astronaut riding a horse in a photorealistic style” to compare. But somehow the iterations seems to be stuck on the same image.

    This is my first test, so could very likely be that I am doing something wrong. Results and settings attached bellow…

    B5B8DE32-AF99-4D4C-BEB5-B9F131916845 2F55B7E2-7DB5-42CA-9B75-7384FDEB9303 B752B2AC-75A4-4F1C-A538-523B4249370E 6DC4FB56-9CDF-4F91-90A4-35C8F4D97FA5

    opened by alelordelo 0
  • [Question] Questions about `zero_embed` and `weights`

    [Question] Questions about `zero_embed` and `weights`

    Thanks for this great work. I'm recently interested in using diffusion model to generate images iteratively. I found your script cfg_sample.py was a nice implementation and I decided to learn from it. However, because I'm new in this field, I've encountered some problems quite hard to understand for me. It'd be great if some hints/suggestions are provided. Thank you!! My questions are listed below. They're about the script cfg_sample.py.

    1. I noticed in the codes, we've used zero_embed as one of the features for conditioning. What is the purpose of using it? Is it designed to allow the case of no prompt for input?
    2. I also notice that the weight of zero_embed is computed as 1 - sum(weights), I think the 1 is to make them sum to one, but actually the weight of zero_embed could be a negative number, should weights be normalized before all the intermediate noise maps are weighted?

    Thanks very much!!

    opened by Karbo123 4
  • Metrics on WikiArt model

    Metrics on WikiArt model

    Hi!

    I wanted to thank you for your work, especially since without you DiscoDiffusion wouldn't exist !

    Still, I was wondering if you had the metrics (Precision, Recall, FID and Inception Score) on the 256x256 WikiArt model ?

    opened by Maxim-Durand 0
  • Any idea on how to attach a clip model to a 64x64 unconditional model from openai/improved-diffusion?

    Any idea on how to attach a clip model to a 64x64 unconditional model from openai/improved-diffusion?

    Hey! love your work and been following your stuff for a while. I have finetuned a 64x64 unconditional model from openai/improved diffusion. checkpoint

    I was curious if you could lend any insight on how to connect CLIP guidance to my model? I have tried repurposing your notebook (https://colab.research.google.com/drive/12a_Wrfi2_gwwAuN3VvMTwVMz9TfqctNj#scrollTo=1YwMUyt9LHG1) however past 100 steps, my models seems to unconverge.

    I think perhaps because there is too much noise being added for the smaller image size? How might i fix this?

    opened by DeepTitan 0
Releases(v0.0.2)
Owner
Katherine Crowson
AI/generative artist.
Katherine Crowson
Pytorch implementation of Decoupled Spatial-Temporal Transformer for Video Inpainting

Decoupled Spatial-Temporal Transformer for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, J

51 Dec 13, 2022
Frigate - NVR With Realtime Object Detection for IP Cameras

A complete and local NVR designed for HomeAssistant with AI object detection. Uses OpenCV and Tensorflow to perform realtime object detection locally for IP cameras.

Blake Blackshear 6.4k Dec 31, 2022
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch = 1.2.0 P

16 Dec 14, 2022
Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling

274 Dec 06, 2022
Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

DRRN-pytorch This is an unofficial implementation of "Deep Recursive Residual Network for Super Resolution (DRRN)", CVPR 2017 in Pytorch. [Paper] You

yun_yang 192 Dec 12, 2022
PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop.

VoiceLoop PyTorch implementation of the method described in the paper VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop. VoiceLoop is a n

Meta Archive 873 Dec 15, 2022
mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms.

mbrl-lib is a toolbox for facilitating development of Model-Based Reinforcement Learning algorithms. It provides easily interchangeable modeling and planning components, and a set of utility function

Facebook Research 724 Jan 04, 2023
Implementation of ETSformer, state of the art time-series Transformer, in Pytorch

ETSformer - Pytorch Implementation of ETSformer, state of the art time-series Transformer, in Pytorch Install $ pip install etsformer-pytorch Usage im

Phil Wang 121 Dec 30, 2022
PyTorch implementation of DirectCLR from paper Understanding Dimensional Collapse in Contrastive Self-supervised Learning

DirectCLR DirectCLR is a simple contrastive learning model for visual representation learning. It does not require a trainable projector as SimCLR. It

Meta Research 49 Dec 21, 2022
Release of the ConditionalQA dataset

ConditionalQA Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers. Disclaimer This dataset

14 Oct 17, 2022
COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset

COPA-SSE Repository for COPA-SSE: Semi-Structured Explanations for Commonsense Reasoning. COPA-SSE contains crowdsourced explanations for the Balanced

Ana Brassard 5 Jul 31, 2022
An intelligent, flexible grammar of machine learning.

An english representation of machine learning. Modify what you want, let us handle the rest. Overview Nylon is a python library that lets you customiz

Palash Shah 79 Dec 02, 2022
PyTorch common framework to accelerate network implementation, training and validation

pytorch-framework PyTorch common framework to accelerate network implementation, training and validation. This framework is inspired by works from MML

Dongliang Cao 3 Dec 19, 2022
A pytorch &keras implementation and demo of Fastformer.

Fastformer Notes from the authors Pytorch/Keras implementation of Fastformer. The keras version only includes the core fastformer attention part. The

153 Dec 28, 2022
Codes and Data Processing Files for our paper.

Code Scripts and Processing Files for EEG Sleep Staging Paper 1. Folder Tree ./src_preprocess (data preprocessing files for SHHS and Sleep EDF) sleepE

Chaoqi Yang 18 Dec 12, 2022
Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo"

dblmahmc Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo" Requirements: https://github.com

1 Dec 17, 2021
AlphaBot2 Pi Core software for interfacing with the various components.

AlphaBot2-Pi-Core AlphaBot2 Pi Core software for interfacing with the various components. This project is currently a W.I.P. I will update this readme

KyleDev 1 Feb 13, 2022
The first public PyTorch implementation of Attentive Recurrent Comparators

arc-pytorch PyTorch implementation of Attentive Recurrent Comparators by Shyam et al. A blog explaining Attentive Recurrent Comparators Visualizing At

Sanyam Agarwal 150 Oct 14, 2022
Yolov5-lite - Minimal PyTorch implementation of YOLOv5

Yolov5-Lite: Minimal YOLOv5 + Deep Sort Overview This repo is a shortened versio

Kadir Nar 57 Nov 28, 2022
A task-agnostic vision-language architecture as a step towards General Purpose Vision

Towards General Purpose Vision Systems By Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, and Derek Hoiem Overview Welcome to the official code base f

AI2 79 Dec 23, 2022