NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

Last update: Dec 23, 2022

Overview

NU-Wave — Official PyTorch Implementation

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling
Junhyeok Lee, Seungu Han @ MINDsLab Inc., SNU

Paper(arXiv): https://arxiv.org/abs/2104.02321 (Accepted to INTERSPEECH 2021)
Audio Samples: https://mindslab-ai.github.io/nuwave

Official Pytorch+Lightning Implementation for NU-Wave.

Update: CODE RELEASED! README is DONE.

Requirements

Pytorch >=1.7.0 for nn.SiLU(swish activation)
Pytorch-Lightning==1.1.6
The requirements are highlighted in requirements.txt.
We also provide docker setup Dockerfile.

Preprocessing

Before running our project, you need to download and preprocess dataset to .pt files

Download VCTK dataset
Remove speaker p280 and p315
Modify path of downloaded dataset data:dir in hparameters.yaml
run utils/wav2pt.py

$ python utils/wav2pt.py

Training

Adjust hparameters.yaml, especially train section.

train:
  batch_size: 18 # Dependent on GPU memory size
  lr: 0.00003
  weight_decay: 0.00
  num_workers: 64 # Dependent on CPU cores
  gpus: 2 # number of GPUs
  opt_eps: 1e-9
  beta1: 0.5
  beta2: 0.999

If you want to train with single speaker, use VCTKSingleSpkDataset instead of VCTKMultiSpkDataset for dataset in dataloader.py. And use batch_size=1 for validation dataloader.
Adjust data section in hparameters.yaml.

data:
  dir: '/DATA1/VCTK/VCTK-Corpus/wav48/p225' #dir/spk/format
  format: '*mic1.pt'
  cv_ratio: (223./231., 8./231., 0.00) #train/val/test

run trainer.py.

$ python trainer.py

If you want to resume training from checkpoint, check parser.

    parser = argparse.ArgumentParser()
    parser.add_argument('-r', '--resume_from', type =int,\
            required = False, help = "Resume Checkpoint epoch number")
    parser.add_argument('-s', '--restart', action = "store_true",\
            required = False, help = "Significant change occured, use this")
    parser.add_argument('-e', '--ema', action = "store_true",\
            required = False, help = "Start from ema checkpoint")
    args = parser.parse_args()

During training, tensorboard logger is logging loss, spectrogram and audio.

$ tensorboard --logdir=./tensorboard --bind_all

Evaluation

run for_test.py or test.py

$ python test.py -r {checkpoint_number} {-e:option, if ema} {--save:option}
or
$ python for_test.py -r {checkpoint_number} {-e:option, if ema} {--save:option}

Please check parser.

    parser = argparse.ArgumentParser()
    parser.add_argument('-r', '--resume_from', type =int,
                required = True, help = "Resume Checkpoint epoch number")
    parser.add_argument('-e', '--ema', action = "store_true",
                required = False, help = "Start from ema checkpoint")
    parser.add_argument('--save', action = "store_true",
               required = False, help = "Save file")

While we provide lightning style test code test.py, it has device dependency. Thus, we recommend to use for_test.py.

References

This implementation uses code from following repositories:

This README and the webpage for the audio samples are inspired by:

The audio samples on our webpage are partially derived from:

VCTK dataset(0.92): 46 hours of English speech from 108 speakers.

Repository Structure

.
├── Dockerfile
├── dataloader.py           # Dataloader for train/val(=test)
├── filters.py              # Filter implementation
├── test.py                 # Test with lightning_loop.
├── for_test.py             # Test with for_loop. Recommended due to device dependency of lightning
├── hparameter.yaml         # Config
├── lightning_model.py      # NU-Wave implementation. DDPM is based on ivanvok's WaveGrad implementation
├── model.py                # NU-Wave model based on lmnt-com's DiffWave implementation
├── requirement.txt         # requirement libraries
├── sampling.py             # Sampling a file
├── trainer.py              # Lightning trainer
├── README.md           
├── LICSENSE
├── utils
│  ├── stft.py              # STFT layer
│  ├── tblogger.py          # Tensorboard Logger for lightning
│  └── wav2pt.py            # Preprocessing
└── docs                    # For github.io
   └─ ...

Citation & Contact

If this repository useful for your research, please consider citing! Bibtex will be updated after INTERSPEECH 2021 conference.

@article{lee2021nuwave,
  title={NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling},
  author={Lee, Junhyeok and Han, Seungu},
  journal={arXiv preprint arXiv:2104.02321},
  year={2021}
}

If you have a question or any kind of inquiries, please contact Junhyeok Lee at [email protected]

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

Related tags

Overview

NU-Wave — Official PyTorch Implementation

Requirements

Preprocessing

Training

Evaluation

References

Repository Structure

Citation & Contact

Owner

MINDs Lab

PyTorch implementation of our CVPR2021 (oral) paper "Prototype Augmentation and Self-Supervision for Incremental Learning"

Bounding Wasserstein distance with couplings

This library contains a Tensorflow implementation of the paper Stability Analysis of Unfolded WMMSE for Power Allocation

PEPit is a package enabling computer-assisted worst-case analyses of first-order optimization methods.

Official implementation of the paper "Lightweight Deep CNN for Natural Image Matting via Similarity Preserving Knowledge Distillation"

Visualizing lattice vibration information from phonon dispersion to atoms (For GPUMD)

A python library for face detection and features extraction based on mediapipe library

The official code of "SCROLLS: Standardized CompaRison Over Long Language Sequences".

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Llvlir - Low Level Variable Length Intermediate Representation

PN-Net a neural field-based framework for depth estimation from single-view RGB images.

The toolkit to generate auto labeled datasets

Omniscient Video Super-Resolution

Official repo of the paper "Surface Form Competition: Why the Highest Probability Answer Isn't Always Right"

Lab course materials for IEMBA 8/9 course "Coding and Artificial Intelligence"

Code for reproducing our paper: LMSOC: An Approach for Socially Sensitive Pretraining

Automatic Data-Regularized Actor-Critic (Auto-DrAC)

An improvement of FasterGICP: Acceptance-rejection Sampling based 3D Lidar Odometry

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

A more easy-to-use implementation of KPConv