Jax/Flax implementation of Variational-DiffWave.

Last update: Dec 16, 2022

Overview

jax-variational-diffwave

Jax/Flax implementation of Variational-DiffWave. (Zhifeng Kong et al., 2020, Diederik P. Kingma et al., 2021.)

DiffWave with Continuous-time Variational Diffusion Models.
DiffWave: A Versatile Diffusion Model for Audio Synthesis, Zhifeng Kong et al., 2020. [arXiv:2009.09761]
Variational Diffusion Models, Diederik P. Kingma et al., 2021. [arXiv:2107.00630]

Requirements

Tested in python 3.7.9 conda environment, requirements.txt

Usage

To train model, run train.py.
Checkpoint will be written on TrainConfig.ckpt, tensorboard summary on TrainConfig.log.

python train.py --data-dir /datasets/ljspeech --from-raw
tensorboard --logdir ./log/

To start to train from previous checkpoint, --load-step is available.

python train.py --load-epoch 10 --config ./ckpt/l1.json

[WIP] To synthesize test set, run synth.py.

python synth.py

[WIP] Pretrained checkpoints are relased on releases.

To use pretrained model, download files and unzip it.
Checkout git repository to proper commit tags and following is sample script.

with open('l1.json') as f:
    config = Config.load(json.load(f))

diffwave = VLBDiffWaveApp(config.model)
diffwave.restore('./l1/l1_99.ckpt')

# mel: [B, T, mel]
audio, _ = diffwave(mel, timesteps=50, key=jax.random.PRNGKey(0))

Jax/Flax implementation of Variational-DiffWave.

Related tags

Overview

jax-variational-diffwave

Requirements

Usage

Owner

YoungJoong Kim

Creating multimodal multitask models

Time series annotation library.

Reference code for the paper "Cross-Camera Convolutional Color Constancy" (ICCV 2021)

An open-source outlier detection package by Getcontact Data Team

Self-supervised learning on Graph Representation Learning (node-level task)

Pytorch implementation of MaskFlownet

TensorFlow implementation of ENet

Learning Features with Parameter-Free Layers (ICLR 2022)

HAT: Hierarchical Aggregation Transformers for Person Re-identification

Two-stage CenterNet

Deep Learning Visuals contains 215 unique images divided in 23 categories

This code finds bounding box of a single human mouth.

Deep Learning Algorithms for Hedging with Frictions

Code release for "BoxeR: Box-Attention for 2D and 3D Transformers"

Christmas face app for Decathlon xmas coding party!

An improvement of FasterGICP: Acceptance-rejection Sampling based 3D Lidar Odometry

a dnn ai project to classify which food people are eating on audio recordings

PyTorch implementation of PNASNet-5 on ImageNet

Source code for our paper "Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures"

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO