Symbolic Music Generation with Diffusion Models

Last update: Jan 07, 2023

Related tags

Deep Learning symbolic-music-diffusion

Overview

Symbolic Music Generation with Diffusion Models

Supplementary code release for our work Symbolic Music Generation with Diffusion Models.

Installation

All code is written in Python 3 (Anaconda recommended). To install the dependencies:

pip install -r requirements.txt

A copy of the Magenta codebase is required for access to MusicVAE and related components. Installation instructions can be found on the Magenta public repository. You will also need to download pretrained MusicVAE checkpoints. For our experiments, we use the 2-bar melody model.

Datasets

We use the Lakh MIDI Dataset to train our models. Follow these instructions to download and build the Lakh MIDI Dataset.

To encode the Lakh dataset with MusicVAE, use scripts/generate_song_data_beam.py:

python scripts/generate_song_data_beam.py \
  --checkpoint=/path/to/musicvae-ckpt \
  --input=/path/to/lakh_tfrecords \
  --output=/path/to/encoded_tfrecords

To preprocess and generate fixed-length latent sequences for training diffusion and autoregressive models, refer to scripts/transform_encoded_data.py:

python scripts/transform_encoded_data.py \
  --encoded_data=/path/to/encoded_tfrecords \
  --output_path =/path/to/preprocess_tfrecords \
  --mode=sequences \
  --context_length=32

Training

Diffusion

python train_ncsn.py --flagfile=configs/ddpm-mel-32seq-512.cfg

TransformerMDN

python train_mdn.py --flagfile=configs/mdn-mel-32seq-512.cfg

Sampling and Generation

Diffusion

python sample_ncsn.py \
  --flagfile=configs/ddpm-mel-32seq-512.cfg \
  --sample_seed=42 \
  --sample_size=1000 \
  --sampling_dir=/path/to/latent-samples

TransformerMDN

python sample_ncsn.py \
  --flagfile=configs/mdn-mel-32seq-512.cfg \
  --sample_seed=42 \
  --sample_size=1000 \
  --sampling_dir=/path/to/latent-samples

Decoding sequences

To convert sequences of embeddings (generated by diffusion or TransformerMDN models) to sequences of MIDI events, refer to scripts/sample_audio.py.

python scripts/sample_audio.py
  --input=/path/to/latent-samples/[ncsn|mdn] \
  --output=/path/to/audio-midi \
  --n_synth=1000 \
  --include_wav=True

Citing

If you use this code please cite it as:

@inproceedings{
  mittal2021symbolicdiffusion,
  title={Symbolic Music Generation with Diffusion Models},
  author={Gautam Mittal and Jesse Engel and Curtis Hawthorne and Ian Simon},
  booktitle={Proceedings of the 22nd International Society for Music Information Retrieval Conference},
  year={2021},
  url={https://archives.ismir.net/ismir2021/paper/000058.pdf}
}

Note

This is not an official Google product.

Symbolic Music Generation with Diffusion Models

Related tags

Overview

Symbolic Music Generation with Diffusion Models

Installation

Datasets

Training

Diffusion

TransformerMDN

Sampling and Generation

Diffusion

TransformerMDN

Decoding sequences

Citing

Note

Owner

Magenta

Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

The Rich Get Richer: Disparate Impact of Semi-Supervised Learning

Repository for RNNs using TensorFlow and Keras - LSTM and GRU Implementation from Scratch - Simple Classification and Regression Problem using RNNs

Weakly Supervised End-to-End Learning (NeurIPS 2021)

A PyTorch implementation of DenseNet.

Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

The codes for the work "Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation"

A machine learning project which can detect and predict the skin disease through image recognition.

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

[WWW 2021] Source code for "Graph Contrastive Learning with Adaptive Augmentation"

Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

Few-NERD: Not Only a Few-shot NER Dataset

Implicit Model Specialization through DAG-based Decentralized Federated Learning

GANTheftAuto is a fork of the Nvidia's GameGAN

Datasets for new state-of-the-art challenge in disentanglement learning

code for our ECCV 2020 paper "A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation"

[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

AFL binary instrumentation

计算机视觉中用到的注意力模块和其他即插即用模块PyTorch Implementation Collection of Attention Module and Plug&Play Module

tree-math: mathematical operations for JAX pytrees