The training code for the 4th place model at MDX 2021 leaderboard A.

Last update: Dec 18, 2022

Overview

This repository contains the training code of our winning model at Music Demixing Challenge 2021, which got the 4th place on leaderboard A (6th in overall), and help us (Kazane Ryo no Danna) winned the bronze prize.

Model Summary

Our final winning approach blends the outputs from three models, which are:

model 1: A X-UMX model [1] which is initialized with the weights of the official baseline, and is fine-tuned with a modified Combinational Multi-Domain Loss from [1]. In particular, we implement and apply a differentiable Multichannel Wiener Filter (MWF) [2] before the loss calculation, and compute the frequency-domain L2 loss with raw complex values.
model 2: A U-Net which is similar to Spleeter [3], where all convolution layers are replaced by D3 Blocks from [4], and two layers of 2D local attention are applied at the bottleneck similar to [5].
model 3: A modified version of Demucs [6], where the original decoding module is replaced by four independent decoders, each of which corresponds to one source.

We didn't encounter overfitting in our pilot experiments, so we used the full musdb training set for all the models above, and stopped training upon convergence of the loss function.

The weights of the three outputs are determined empirically:

	Drums	Bass	Other	Vocals
model 1	0.2	0.1	0	0.2
model 2	0.2	0.17	0.5	0.4
model 3	0.6	0.73	0.5	0.4

For the spectrogram-based models (model 1 and 2), we apply MWF to the outputs with one iteration before the fusion.

[1] Sawata, Ryosuke, et al. "All for One and One for All: Improving Music Separation by Bridging Networks." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.

[2] Antoine Liutkus, & Fabian-Robert Stöter. (2019). sigsep/norbert: First official Norbert release (v0.2.0). Zenodo. https://doi.org/10.5281/zenodo.3269749

[3] Hennequin, Romain, et al. "Spleeter: a fast and efficient music source separation tool with pre-trained models." Journal of Open Source Software 5.50 (2020): 2154.

[4] Takahashi, Naoya, and Yuki Mitsufuji. "D3net: Densely connected multidilated densenet for music source separation." arXiv preprint arXiv:2010.01733 (2020).

[5] Wu, Yu-Te, Berlin Chen, and Li Su. "Multi-Instrument Automatic Music Transcription With Self-Attention-Based Instance Segmentation." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 2796-2809.

[6] Défossez, Alexandre, et al. "Music source separation in the waveform domain." arXiv preprint arXiv:1911.13254 (2019).

How to reproduce the training

Install Requirements / Build Virtual Environment

We recommend using conda.

conda env create -f environment.yml
conda activate demixing

Prepare Data

Please download musdb, and edit the "root" parameters in all the json files listed under configs/ to the path where you have the dataset.

Training Model 1

First download the pre-trained model:

wget https://zenodo.org/record/4740378/files/pretrained_xumx_musdb18HQ.pth

Copy the weights for initializing our model:

python xumx_weights_convert.py pretrained_xumx_musdb18HQ.pth xumx_weights.pth

Start training!

python train.py configs/x_umx_mwf.json --weights xumx_weights.pth

Checkpoints will be located under saved/. The config was set to run on a single RTX 3070.

Training Model 2

python train.py configs/unet_attn.json --device_ids 0 1 2 3

Checkpoints will be located under saved/. The config was set to run on four Tesla V100.

Training Model 3

python train.py configs/demucs_split.json

Checkpoints will be located under saved/. The config was set to run on a single RTX 3070, using gradient accumulation and mixed precision training.

Tensorboard Logging

You can monitor the training process using tensorboard:

tesnorboard --logdir runs/

Inference

First make sure you installed danna-sep. Then convert your checkpoints into jit scripts and replace the files under DANNA_CHECKPOINTS:

python jit_convert.py configs/x_umx_mwf.json saved/CrossNet\ Open-Unmix_checkpoint_XXX.pt $DANNA_CHECKPOINTS/xumx_mwf.pth

python jit_convert.py configs/unet_attn.json saved/UNet\ Attention_checkpoint_XXX.pt $DANNA_CHECKPOINTS/unet_attention.pth

python jit_convert.py configs/demucs_split.json saved/DemucsSplit_checkpoint_XXX.pt $DANNA_CHECKPOINTS/demucs_4_decoders.pth

Now you can use danna-sep to separate you favorite music and see how it works!

The training code for the 4th place model at MDX 2021 leaderboard A.

Related tags

Overview

Model Summary

How to reproduce the training

Install Requirements / Build Virtual Environment

Prepare Data

Training Model 1

Training Model 2

Training Model 3

Tensorboard Logging

Inference

Additional Resources

Owner

Chin-Yun Yu

This repo is to provide a list of literature regarding Deep Learning on Graphs for NLP

Turn clang-tidy warnings and fixes to comments in your pull request

History Aware Multimodal Transformer for Vision-and-Language Navigation

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

Python3 to Crystal Translation using Python AST Walker

Continuously update some NLP practice based on different tasks.

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields

Ukrainian TTS (text-to-speech) using Coqui TTS

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

This project uses word frequency and Term Frequency-Inverse Document Frequency to summarize a text.

Translation to python of Chris Sims' optimization function

Anomaly Detection 이상치 탐지 전처리 모듈

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

This is the Alpha of Nutte language, she is not complete yet / Essa é a Alpha da Nutte language, não está completa ainda

Задания КЕГЭ по информатике 2021 на Python

Multilingual word vectors in 78 languages

DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

This repository will contain the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"