Official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

Overview

MidiBERT-Piano


MIT License ARXIV LICENSE STAR ISSUE

Authors: Yi-Hui (Sophia) Chou, I-Chun (Bronwin) Chen

Introduction

This is the official repository for the paper, MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding.

With this repository, you can

  • pre-train a MidiBERT-Piano with your customized pre-trained dataset
  • fine-tune & evaluate on 4 downstream tasks
  • compare its performance with a Bi-LSTM

All the datasets employed in this work are publicly available.

Quick Start

If you'd like to reproduce the results (MidiBERT) shown in the paper, image-20210710185007453

  1. please download the checkpoints, and rename files like the following
MidiBERT/{CP/remi}/
result
└── finetune
	└── melody_default
		└── model_best.ckpt
	└── velocity_default
		└── model_best.ckpt
	└── composer_default
		└── model_best.ckpt
	└── emotion_default
		└── model_best.ckpt
  1. please refer to evaluation,

and you are free to go! (btw, no gpu is needed for evaluation)

Installation

  • Python3
  • Install generally used packages for MidiBERT-Piano:
git clone https://github.com/wazenmai/MIDI-BERT.git
cd MIDI-BERT
pip install -r requirements.txt

A. Prepare Data

All data in CP/REMI token are stored in data/CP & data/remi, respectively, including the train, valid, test split.

You can also preprocess as below.

1. download dataset and preprocess

  • Pop1K7
  • ASAP
    • Step 1: Download ASAP dataset from the link
    • Step 2: Use Dataset/ASAP_song.pkl to extract songs to Dataset/ASAP
  • POP909
    • preprocess to have 865 pieces in qualified 4/4 time signature
    • exploratory.py to get pieces qualified in 4/4 time signature and save at qual_pieces.pkl
    • preprocess.py to realign and preprocess
    • Special thanks to Shih-Lun (Sean) Wu
  • Pianist8
    • Step 1: Download Pianist8 dataset from the link
    • Step 2: Use Dataset/pianist8_(mode).pkl to extracts songs to Dataset/pianist8/mode
  • EMOPIA
    • Step 1: Download Emopia dataset from the link
    • Step 2: Use Dataset/emopia_(mode).pkl to extracts songs to Dataset/emopia/mode

2. prepare dict

dict/make_dict.py customize the events & words you'd like to add.

In this paper, we only use Bar, Position, Pitch, Duration. And we provide our dictionaries in CP & REMI representation.

dict/CP.pkl

dict/remi.pkl

3. prepare CP & REMI

./prepare_data/CP

  • Run python3 main.py . Please specify the dataset and whether you wanna prepare an answer array for the task (i.e. melody extraction, velocity prediction, composer classification and emotion classification).
  • For example, python3 main.py --dataset=pop909 --task=melody --dir=[DIR_TO_STORE_DATA]

./prepare_data/remi/

  • The same logic applies to preparing REMI data.

Acknowledgement: CP repo, remi repo

You may encode these midi files in different representations, the data split is in ***.

B. Pre-train a MidiBERT-Piano

./MidiBERT/CP and ./MidiBERT/remi

  • pre-train a MidiBERT-Piano
python3 main.py --name=default

A folder named CP_result/pretrain/default/ will be created, with checkpoint & log inside.

  • customize your own pre-training dataset Feel free to select given dataset and add your own dataset. To do this, add --dataset, and specify the respective path in load_data() function. For example,
# to pre-train a model with only 2 datasets
python3 main.py --name=default --dataset pop1k7 asap	

Acknowledgement: HuggingFace

Special thanks to Chin-Jui Chang

C. Fine-tune & Evaluate on Downstream Tasks

./MidiBERT/CP and ./MidiBERT/remi

1. fine-tuning

  • finetune.py
python3 finetune.py --task=melody --name=default

A folder named CP_result/finetune/{name}/ will be created, with checkpoint & log inside.

2. evaluation

  • eval.py
python3 eval.py --task=melody --cpu --ckpt=[ckpt_path]

Test loss & accuracy will be printed, and a figure of confusion matrix will be saved.

The same logic applies to REMI representation.

D. Baseline Model (Bi-LSTM)

./baseline/CP & ./baseline/remi

We seperate our baseline model to note-level tasks, which used a Bi-LSTM, and sequence-level tasks, which used a Bi-LSTM + Self-attention model.

For evaluation, in note-level task, please specify the checkpoint name. In sequence-level task, please specify only the output name you set when you trained.

  • Train a Bi-LSTM

    • note-level task
     python3 main.py --task=melody --name=0710
    • sequence-level task
     python3 main.py --task=composer --output=0710
  • Evaluate

    • note-level task:
     python3 eval.py --task=melody --ckpt=result/melody-LSTM/0710/LSTM-melody-classification.pth
    • sequence-level task
     python3 eval.py --task='composer' --ckpt=0710

The same logic applies to REMI representation.

Special thanks to Ching-Yu (Sunny) Chiu

E. Skyline

Get the accuracy on pop909 using skyline algorithm

python3 cal_acc.py

Since Pop909 contains melody, bridge, accompaniment, yet skyline cannot distinguish between melody and bridge.

There are 2 ways to report its accuracy:

  1. Consider Bridge as Accompaniment, attains 78.54% accuracy
  2. Consider Bridge as Melody, attains 79.51%

Special thanks to Wen-Yi Hsiao for providing the code for skyline algorithm.

Citation

If you find this useful, please cite our paper.

@article{midibertpiano,
  title={{MidiBERT-Piano}: Large-scale Pre-training for Symbolic Music Understanding},
  author={Yi-Hui Chou and I-Chun Chen and Chin-Jui Chang and Joann Ching, and Yi-Hsuan Yang},
  journal={arXiv preprint arXiv:2107.05223},
  year={2021}
}
Athena is the only tool that you will ever need to optimize your portfolio.

Athena Portfolio optimization is the process of selecting the best portfolio (asset distribution), out of the set of all portfolios being considered,

Indrajit 1 Mar 25, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and i

yifan liu 147 Dec 03, 2022
Tensorflow implementation of "BEGAN: Boundary Equilibrium Generative Adversarial Networks"

BEGAN in Tensorflow Tensorflow implementation of BEGAN: Boundary Equilibrium Generative Adversarial Networks. Requirements Python 2.7 or 3.x Pillow tq

Taehoon Kim 922 Dec 21, 2022
A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

CLIP4CMR A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval The original data and pre-calculate

24 Dec 26, 2022
This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution

Trajectory Prediction using Equivariant Continuous Convolution (ECCO) This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivar

Spatiotemporal Machine Learning 45 Jul 22, 2022
Add gui for YoloV5 using PyQt5

HEAD 更新2021.08.16 **添加图片和视频保存功能: 1.图片和视频按照当前系统时间进行命名 2.各自检测结果存放入output文件夹 3.摄像头检测的默认设备序号更改为0,减少调试报错 温馨提示: 1.项目放置在全英文路径下,防止项目报错 2.默认使用cpu进行检测,自

Ruihao Wang 65 Dec 27, 2022
PyTorch implementations of neural network models for keyword spotting

Honk: CNNs for Keyword Spotting Honk is a PyTorch reimplementation of Google's TensorFlow convolutional neural networks for keyword spotting, which ac

Castorini 475 Dec 15, 2022
《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

42 Nov 24, 2022
Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA)

Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA). Master's thesis documents. Bibliography, experiments and reports.

Erick Cobos 73 Dec 04, 2022
Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding (AAAI 2020) - PyTorch Implementation

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding PyTorch implementation for the Scalable Attentive Sentence-Pair Modeling vi

Microsoft 25 Dec 02, 2022
Marvis is Mastouri's Jarvis version of the AI-powered Python personal assistant.

Marvis v1.0 Marvis is Mastouri's Jarvis version of the AI-powered Python personal assistant. About M.A.R.V.I.S. J.A.R.V.I.S. is a fictional character

Reda Mastouri 1 Dec 29, 2021
Our VMAgent is a platform for exploiting Reinforcement Learning (RL) on Virtual Machine (VM) scheduling tasks.

VMAgent is a platform for exploiting Reinforcement Learning (RL) on Virtual Machine (VM) scheduling tasks. VMAgent is constructed based on one month r

56 Dec 12, 2022
Numenta published papers code and data

Numenta research papers code and data This repository contains reproducible code for selected Numenta papers. It is currently under construction and w

Numenta 293 Jan 06, 2023
EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling

Frustratingly Simple Pretraining Alternatives to Masked Language Modeling This is the official implementation for "Frustratingly Simple Pretraining Al

Atsuki Yamaguchi 31 Nov 18, 2022
LinkNet - This repository contains our Torch7 implementation of the network developed by us at e-Lab.

LinkNet This repository contains our Torch7 implementation of the network developed by us at e-Lab. You can go to our blogpost or read the article Lin

e-Lab 158 Nov 11, 2022
A python tutorial on bayesian modeling techniques (PyMC3)

Bayesian Modelling in Python Welcome to "Bayesian Modelling in Python" - a tutorial for those interested in learning how to apply bayesian modelling t

Mark Regan 2.4k Jan 06, 2023
Official code repository for Continual Learning In Environments With Polynomial Mixing Times

Official code for Continual Learning In Environments With Polynomial Mixing Times Continual Learning in Environments with Polynomial Mixing Times This

Sharath Raparthy 1 Dec 19, 2021
A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

Exploring simple siamese representation learning This is a PyTorch re-implementation of the SimSiam paper on ImageNet dataset. The results match that

Taojiannan Yang 72 Nov 09, 2022
Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and t

305 Dec 16, 2022
Official implementation for Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

Multi-modal Interaction Graph Convolutioal Network for Temporal Language Localization in Videos Official implementation for Multi-Modal Interaction Gr

Zongmeng Zhang 15 Oct 18, 2022