Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper

Last update: Dec 11, 2022

Related tags

Overview

Divide and Remaster Utility Tools

Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper

The DnR dataset is build from three, well-established, audio datasets; Librispeech, Free Music Archive (FMA), and Freesound Dataset 50k (FSD50K). We offer our dataset in both 16kHz and 44.1kHz sampling-rate along time-stamped annotations for each of the classes (genre for 'music', audio-tags for 'sound-effects', and transcription for 'speech'). We provide below more informations on how the dataset is build and what it's consists of exactly. We also go over the process of building the dataset from scratch for the cases it needs to.

Dataset Overview
Get the DnR Dataset
Resources and Support

Dataset Overview

The Divide and Remaster (DnR) dataset is a dataset aiming at providing research support for a relatively unexplored case of source separation with mixtures involving music, speech, and sound-effects (SFX) as their sources. The dataset is build from three, well-established, datasets. Consequently if one wants to build DnR from scratch, the aforementioned datasets will have to be downloaded first. Alternatively, DnR is also available on Zenodo

Get the DnR Dataset

In order to obtain DnR, several options are available depending on the task at hand:

Download

DnR-HQ (44.1kHz) is available on Zenodo at the following or simply run:

link to the Zenodo dataset coming soon ...

Alternatively, if DnR-16kHz is needed, please first download DnR-HQ locally. You can then downsample the dataset (either in-place or not) by cloning the dnr-utils repository and running:

python dnr_utils.py --task=downsample --inplace=True

Building DnR From Scratch

In the section, we go over the DnR building process. Since DnR is directly drawn from *FSD50K*, *LibriSpeech*/*LibriVox*, and *FMA, we first need to download these datasets. Please head to the following links for more details on how to get them:

Datasets Downloads

FSD50K
FMA-Medium Set
LibriSpeech/LibriVox

Please note that for FMA, the medium set only is required. In addition to the audio files, the metadata should also be downloaded. For LibriSpeech DnR uses dev-clean, test-clean, and train-clean-100. DnR will use the folder structure as well as metadata from LibriSpeech, but ultimately will build the LibriSpeech-HQ dataset off the original LibriVox mp3s, which is why we need them both for building DnR.

After download, all four datasets are expected to be found in the same root directory. Our root tree may look something like that. As the standardization script will look for specific file name, please make sure that all directory names conform to the ones described below:

root
├── fma-medium
│   ├── fma_metadata
│   │   ├── genres.csv
│   │   └── tracks.csv
│   ├── 008
│   ├── 008
│   ├── 009
│   └── 010
│   └── ...
├── fsd50k
│   ├── FSD50K.dev_audio
│   ├── FSD50K.eval_audio
│   └── FSD50K.ground_truth
│   │   ├── dev.csv
│   │   ├── eval.csv
│   │   └── vocabulary.csv
├── librispeech
│   ├── dev-clean
│   ├── test-clean
│   └── train-clean-100
└── librivox
    ├── 14
    ├── 16
    └── 17
    └── ...

Datasets Standardization

Once all four datasets are downloaded, some standardization work needs to be taken care of. The standardization process can be be executed by running standardization.py, which can be found in the dnr-utils repository. Prior to running the script you may want to install all the necessary dependencies included as part of the requirement.txt with pip install -r requirements.txt. Note: pydub uses ffmpeg under its hood, a system install of fmmpeg is thus required. Please see pydub's install instructions for more information. The standardization command may look something like:

python standardization.py --fsd50k-path=./FSD50K --fma-path=./FMA --librivox-path=./LibriVox --librispeech-path=./LibiSpeech  --dest-dir=./dest --validate-audio=True

DnR Dataset Compilation

Once the three resulting datasets are standardized, we are ready to finally compile DnR. At this point you should already have cloned the dnr-utils repository, which contains two key files:

config.py contains some configuration entries needed by the main script builder. You want to set all the appropriate paths pointing to your local datasets and ground truth files in there.
The compilation for a given set (here, train, val, and eval) can be executed with compile_dataset.py, for example by running the following commands for each set:

python compile_dataset.py with cfg.train

python compile_dataset.py with cfg.val

python compile_dataset.py with cfg.eval

Known Issues

Some known bugs and issues that we're aware. if not listed below, feel free to open a new issue here:

If building from scratch, pydub will fail at reading 15 mp3 files from the FMA medium-set and will return the following error: mp3 @ 0x559b8b084880] Failed to read frame size: Could not seek to 1026.
If building DnR from scratch, the script may return the following error, coming from pyloudnorm: Audio must be have length greater than the block size. That's because some audio segment, especially SFX events, may be shorter than 0.2 seconds, which is the minimum sample length (window) required by pyloudnorm for normalizing the audio. We just ignore these segments.

Contact and Support

Have an issue, concern, or question about DnR or its utility tools ? If so, please open an issue here

For any other inquiries, feel free to shoot an email at: [email protected], my name is Darius Petermann ;)

Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper

Related tags

Overview

Divide and Remaster Utility Tools

Dataset Overview

Get the DnR Dataset

Download

Building DnR From Scratch

Datasets Downloads

Datasets Standardization

DnR Dataset Compilation

Known Issues

Contact and Support

Owner

Darius Petermann

A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

Old Photo Restoration (Official PyTorch Implementation)

A High-Performance Distributed Library for Large-Scale Bundle Adjustment

EgGateWayGetShell py脚本

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

This repository contains the code for our fast polygonal building extraction from overhead images pipeline.

AI创造营：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

Learning with Noisy Labels via Sparse Regularization, ICCV2021

PyTorch implementation of Decoupling Value and Policy for Generalization in Reinforcement Learning

HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

RAANet: Range-Aware Attention Network for LiDAR-based 3D Object Detection with Auxiliary Density Level Estimation

Face Depixelizer based on "PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models" repository.

Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

CountDown to New Year and shoot fireworks

Python script that analyses the given datasets and comes up with the best polynomial regression representation with the smallest polynomial degree possible

Laplace Redux -- Effortless Bayesian Deep Learning

Its a Plant Leaf Disease Detection System based on Machine Learning.

Utility tools for the "Divide and Remaster" dataset, introduced as part of the Cocktail Fork problem paper

Related tags

Overview

Divide and Remaster Utility Tools

Dataset Overview

Get the DnR Dataset

Download

Building DnR From Scratch

Datasets Downloads

Datasets Standardization

DnR Dataset Compilation

Known Issues

Contact and Support

Owner

Darius Petermann

A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

Old Photo Restoration (Official PyTorch Implementation)

A High-Performance Distributed Library for Large-Scale Bundle Adjustment

EgGateWayGetShell py脚本

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

This repository contains the code for our fast polygonal building extraction from overhead images pipeline.

AI创造营 ：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

Learning with Noisy Labels via Sparse Regularization, ICCV2021

PyTorch implementation of Decoupling Value and Policy for Generalization in Reinforcement Learning

HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision

Learning recognition/segmentation models without end-to-end training. 40%-60% less GPU memory footprint. Same training time. Better performance.

Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

RAANet: Range-Aware Attention Network for LiDAR-based 3D Object Detection with Auxiliary Density Level Estimation

Face Depixelizer based on "PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models" repository.

Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

CountDown to New Year and shoot fireworks

Python script that analyses the given datasets and comes up with the best polynomial regression representation with the smallest polynomial degree possible

Laplace Redux -- Effortless Bayesian Deep Learning

Its a Plant Leaf Disease Detection System based on Machine Learning.

AI创造营：Metaverse启动机之重构现世，结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人