Audio2midi - Automatic Audio-to-symbolic Arrangement

Related tags

Audioaudio2midi
Overview

Automatic Audio-to-symbolic Arrangement

This is the repository of the project "Audio-to-symbolic Arrangement via Cross-modal Music Representation Learning" by Ziyu Wang, Dejing Xu, Gus Xia and Ying Shan (arXiv:2112.15110).

Model Overview

We propose an audio-to-symbolic generative model to transfer an input audio to its piano arrangement (in MIDI format). The task is similar to piano cover song production based on the original pop songs, or piano score reduction based on a classical symphony.

The input audio is a pop-song audio under arbitrary instrumentation. The output is a MIDI piano arrangement of the original audio which keeps most musical information (incdluding chord, groove and lead melody). The model is built upon the previous projects including EC2-VAE, PianoTree VAE, and Poly-Dis. The novelty of this study is the cross-modal representation disentanglement design and the 3-step training strategy shown below.

workflow

Application

This audio-to-symbolic conversion and arrangement module can lead to many applications (e.g., music score generation, automatic acccompaniment etc.) In this project, we demonstrate a specific application: audio-level pop style transfer, in which we transfer a pop song "vocal + (full band) accompaniment" to "vocal + piano accompaniment". The automation is an integration of spleeter source separation, audio-to-symbolic conversion, performance rendering, and synthesis (as shown in the diagram below).

style_transfer_app

Quick Start

In demo/audio2midi_out we presents generated results in separated song folders. In each song folder, there is a MIDI file and an audio file.

  • The MIDI file is the generated piano arrangement by our model (with rule-based performance rendering).
  • The audio file is a combination of the generated piano arragement and the original vocal (separated by Spleeter) using GarageBand.

In this section, we demonstrate how to use our model to reproduce the demo. Later on, you can generate piano arrangement with your own inputs.

Step One: Prepare the python environment specified by requirements.txt.

Step Two: Download the pre-trained model parameter files and put them under params. (Link)

Step Three: Source separation the audio files at demo/example_audio using Spleeter and output the results at demo/spleeter_out (see the original repository for more detail). If you do not want to run Spleeter right now, we have prepare the seprated result for you at demo/spleeter_out.

Step Four: Run the following command at the project root directory:

python scripts/inference_pop_style_transfer.py ./demo/spleeter_out/example1/accompaniment.wav ./demo/audio2midi_out/example1
python scripts/inference_pop_style_transfer.py ./demo/spleeter_out/example2/accompaniment.wav ./demo/audio2midi_out/example2

Here, the first parameter after inference_pop_style_transfer.py is the input audio file path, and the second parameter is the output MIDI folder. The command also allows applying different models, switching autoregressive mode, and many other options. See more details by

python scripts/inference_pop_style_transfer.py -h

The generated MIDI files can be found at demo/audio2midi_out. Then, the MIDI arrangement and vocal audio can be combined and further processed in the DAW.

Project Details

Before discussing the audio-to-midi conversion in more detail, we introduce different models and training stages that can be specified in the command.

Model Types

There are four model types (or model_id):

  • (a2s): the proposed model (requiring chord extraction because it learns disentangled latent chord representation).
  • (a2s-nochd): the proposed model without chord latent representation (i.e., removing chord encoder or chord decoder).
  • (supervised): pure supervised training (audio encoder + symbolic decoder (without symbolic feature decoding)). No resampling and KL penalty.
  • (supervised+kl): model architecture is same as supervised. Resampling and KL penalty is applied (to audio-texture representation).

In general, a2s and a2s-nochd perform the best as they are the proposed models. The rest are baseline models in the paper. (Note: the other baseline model (Chord-based generation) is not implemented because it does not depend on any audio information.)

Stages (Autoregressive Mode)

As shown in the diagram above, a2s and a2s-nochd have 3 training stages, where stage 1 and 2 are not ready to be used, and stage 3 has two options (i.e., pure audio-based and autoregression).

  • (Stage 3a): pure audio-based. The model input is the audio itself.
  • (Stage 3b): autoregression. The model input is the audio itself and previous 2-measure output.

During inference, autoregressive=True mode means we use a stage 3b model auto regressively, and use stage 3a model to compute the initial segment. On the other hand, autoregressive=False model means we use a stage 3a model to arrangement every 2-measure segment independently. In general, autoregressive=False gives output of slightly higher quality.

Audio2midi Command

Run audio2midi in command line using the command:

python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path>

Note: the command should be run at the project root directory. For clarity, the user can only specify the output MIDI folder (where possibly multiple versions of MIDI arrangement will be produced in the same folder). To specify output MIDI filename, see the python interfance section below.

The command has the following functions:

  1. To specify model_id using -m or --model (default to be a2s), e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --model a2s-nochd
    
  2. To turn on autoregressive mode using -a or --auto (detault to be autoregressive=False), e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --auto
    
  3. By default, the beat and chord analysis (a preprocessing step by madmom) will be applied to the separated accompaniment track. The analysis can also be applied to the original audio using -o or --original, (though there are usually no aparent difference), e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --original <input_audio_path>
    
  4. At the early stage of this research, we only have stage 3a and stage 3b has not been developed yet. We provide the model parameter of the old stage 3a model and name it after the alternative model. The generation results also sound good in general. The alternative model can be called by --alt, e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --alt
    

    Alternative model can be also called in autoregressive mode, where the model for initial computation will be replaced with the alternation.

  5. To run through all the model types, autoregressive modes and alternative models, using --model all, e.g.,

    python scripts/inference_pop_style_transfer.py <input_acc_audio_path> <output_midi_folder_path> --model all
    

For more detail, check it out using

python scripts/inference_pop_style_transfer.py -h

Audio2midi Python Interface

To use the model in a more flexible fashion (e.g., use your own trained model parameter, combine the a2s routine in your project, or run multiple files at a time), we provide the python interface. Specifically, call the functions in the pop_style_transfer package. For example, in a python script outside this project, we can write the following python code:

from audio2midi.pop_style_transfer import acc_audio_to_midi, load_model

model, model0 = load_model(model_id='a2s', 
                           autoregressive=False,
                           alt=False
                          )

acc_audio_to_midi(input_acc_audio_path=...,
                  output_midi_path=...,
                  model=model, 
                  model0=model0,
                  input_audio_path=..., 
                  input_analysis_npy_path=...,
                  save_analysis_npy_path=..., 
                  batch_size=...
                 )

Note: Unlike in the command line mode where output location can only be a folder directory, we can specify filename in the python interface. Also, chord and beat preprocessing can be loaded or stored, and batch size can be chaged.

Data

We use the POP909 dataset and its time-aligned audio files. The audio files are colloected from the internet. The data should be processed and put under the data folder. Specifically, there are 4 sub-folders:

  • data/audio-midi-combined: contain 909 folders of 909 songs. In ecah folder, we have
    • audio.mp3 the original audio
    • midi folder containing the MIDI arrangement and annotation files.
    • split audio folder containing the vocal-accompaniment separation result using spleeter:2stems model.
  • data/quantized_pop909_4_bin: containing 909 npz files storing pre-processed symbolic data from MIDI and annotation files.
  • data/audio_stretched: containing 909 pairs of wav and pickle files storing the accompaniment audio time-stretched to 13312 frames per (symbolic) beat under 22050 sample rate (approximately 99.384 bpm).
  • data/train_split: containing a pickle file for train and valid split ids at song level.

The pre-processed symbolic-part of the dataset (i.e., quantized_pop909_4_bin and train_split can be downloaded at link.

Training the model

To train the model, run the scripts in ./scripts/run*.py, at project root directory. The result will be saved in the result folder. The model should be trained in a sequential manner. For example, run

python ./scripts/run_a2s_stage1.py

When the stage i model is trained, put the model parameter file path in line 13 of /scripts/run_a2s_stage{j}.py, and run

python ./scripts/run_a2s_stage{j}.py

Here, (i, j) can be (i=1, j=2), (i=2, j=3), (i=2, j=4). (3 for stage 3a and 4 for stage 3b). The other model types can be run in a similar fashion. The training result can be downloaded at link.

Summary of Materials to Download

References

Owner
Ziyu Wang
Computer music, Deep Music Generation & Artificial musical intelligence
Ziyu Wang
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Project DeepSpeech DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Spee

Mozilla 20.8k Jan 03, 2023
The project aims to develop a personal-assistant for Windows & Linux-based systems

The project aims to develop a personal-assistant for Windows & Linux-based systems. Samiksha draws its inspiration from virtual assistants like Cortana for Windows, and Siri for iOS. It has been desi

SHUBHANSHU RAI 1 Jan 16, 2022
Tune in is a Collaborative Music Playing Systems where multiple guests can join a room and enjoy the song being played

✨A collaborative music playing systems🎶 where multiple guests can join a room ➡🚪 and enjoy the song🎧 being played.

Vedansh Vijaywargiya 8 Nov 05, 2022
Python library for handling audio datasets.

AUDIOMATE Audiomate is a library for easy access to audio datasets. It provides the datastructures for accessing/loading different datasets in a gener

Matthias 121 Nov 27, 2022
🎵 A repository for manually annotating files to create labeled acoustic datasets for machine learning.

🎵 A repository for manually annotating files to create labeled acoustic datasets for machine learning.

Jim Schwoebel 28 Dec 22, 2022
PatrikZero's CS:GO Hearing protection

Program that lowers volume when you die and get flashed in CS:GO. It aims to lower the chance of hearing damage by reducing overall sound exposure. Uses game state integration. Anti-cheat safe.

Patrik Žúdel 224 Dec 04, 2022
A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

نافع الهلالي 1 Oct 29, 2021
Full LAKH MIDI dataset converted to MuseNet MIDI output format (9 instruments + drums)

LAKH MuseNet MIDI Dataset Full LAKH MIDI dataset converted to MuseNet MIDI output format (9 instruments + drums) Bonus: Choir on Channel 10 Please CC

Alex 6 Nov 20, 2022
An AI for Music Generation

An AI for Music Generation

Hao-Wen Dong 1.3k Dec 31, 2022
Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals

Welcome to MARSYAS. MARSYAS is a software framework for rapid prototyping of audio applications, with flexibility and extensibility as primary concer

Marsyas Developers Group 364 Oct 31, 2022
The venturimeter works on the principle of Bernoulli's equation, i.e., the pressure decreases as the velocity increases.

The venturimeter works on the principle of Bernoulli's equation, i.e., the pressure decreases as the velocity increases. The cross-section of the throat is less than the cross-section of the inlet pi

Shankar Mahadevan L 1 Dec 03, 2021
Users can transcribe their favorite piano recordings to MIDI files after installation

Users can transcribe their favorite piano recordings to MIDI files after installation

190 Dec 17, 2022
A voice assistant which can handle your everyday task and allows you to book items from your favourite store!

Voicely Table of Contents About The Project Built With Getting Started Prerequisites Installation Usage Roadmap Contributing License Contact Acknowled

Awantika Nigam 2 Nov 17, 2021
Music player and music library manager for Linux, Windows, and macOS

Ex Falso / Quod Libet - A Music Library / Editor / Player Quod Libet is a music management program. It provides several different ways to view your au

Quod Libet 1.2k Jan 07, 2023
Analyze, visualize and process sound field data recorded by spherical microphone arrays.

Sound Field Analysis toolbox for Python The sound_field_analysis toolbox (short: sfa) is a Python port of the Sound Field Analysis Toolbox (SOFiA) too

Division of Applied Acoustics at Chalmers University of Technology 69 Nov 23, 2022
A python program for visualizing MIDI files, and displaying them in a spiral layout

SpiralMusic_python A python program for visualizing MIDI files, and displaying them in a spiral layout For a hardware version using Teensy & LED displ

Gavin 6 Nov 23, 2022
Real-time audio visualizations (spectrum, spectrogram, etc.)

Friture Friture is an application to visualize and analyze live audio data in real-time. Friture displays audio data in several widgets, such as a sco

Timothée Lecomte 700 Dec 31, 2022
A Quick Music Player Made Fully in Python

Quick Music Player Made Fully In Python. Pure Python, cross platform, single function module with no dependencies for playing sounds. Installation & S

1 Dec 24, 2021
[Singing Log] Let your program learn to sing!

[Singing Log] Let your program learn to sing! You must have thought this was changelog when you saw the English title, but it's not, it's chànggēlog. What it does is allow your program to print logs

黄巍 22 Sep 03, 2022
Synthesia but open source, made in python and free

PyPiano Synthesia but open source, made in python and free Requirements are in requirements.txt If you struggle with installation of pyaudio, run : pi

DaCapo 11 Nov 06, 2022