Emotional conditioned music generation using transformer-based model.

Related tags

Deep LearningEMOPIA
Overview

This is the official repository of EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation. The paper has been accepted by International Society for Music Information Retrieval Conference 2021.

  • Note: We release the transcribed MIDI files. As for the audio part, due to the copyright issue, we will only release the YouTube ID of the tracks and the timestamp of them. You might use open source crawler to get the audio file.

Use EMOPIA by MusPy

  1. install muspy
pip install muspy
  1. Use it in your script
import muspy

emopia = muspy.EMOPIADataset("data/emopia/", download_and_extract=True)
emopia.convert()
music = emopia[0]
print(music.annotations[0].annotation)

You can get the label of the piece of music:

{'emo_class': '1', 'YouTube_ID': '0vLPYiPN7qY', 'seg_id': '0'}
  • emo_class: ['1', '2', '3', '4']
  • YouTube_ID: the YouTube ID of this piece of music
  • seg_id: means this piece of music is the ith piece we take from this song. (zero-based).

For more usage please refer to MusPy.

Emotion Classification

For the classification models and codes, please refer to this repo.

Conditional Generation

Environment

  1. Install PyTorch and fast transformer:

    • torch==1.7.0 (Please install it according to your CUDA version.)

    • fast transformer :

      pip install --user pytorch-fast-transformers 
      

      or refer to the original repository

  2. Other requirements:

    pip install -r requirements.txt

Usage

Inference

  1. Download the checkpoints and put them into exp/

    • Manually:

    • By commend: (install gdown: pip install gdown)

      #baseline:
      gdown --id 1Q9vQYnNJ0hXBFwcxdWQgDNmzoW3MLl3h --output exp/baseline.zip
      
      # no-pretrained transformer
      gdown --id 1ZULJgBRu2Wb3jxFmGfAHP1v_tjoryFM7 --output exp/no-pretrained_transformer.zip
      
      # pretrained transformer
      gdown --id 19Seq18b2JNzOamEQMG1uarKjj27HJkHu --output exp/pretrained_transformer.zip
      
  2. Inference options:

  • num_songs: number of midis you want to generate.

  • out_dir: the folder where the generated midi will be saved. If not specified, midi files will be saved to exp/MODEL_YOU_USED/gen_midis/.

  • task_type: the task_type needs to be the same as the task specified during training.

    • '4-cls' for 4 class conditioning
    • 'Arousal' for only conditioning on arousal
    • 'Valence' for only conditioning on Valence
    • 'ignore' for not conditioning
  • emo_tag: the target class of emotion you want to assign.

    • If the task_type is '4-cls', emo_tag can be: 1,2,3,4, which refers to Q1, Q2, Q3, Q4.
    • If the task_type is 'Arousal', emo_tag can be: 1, 2. 1 for High arousal, 2 for Low arousal.
    • If the task_type is 'Valence', emo_tag can be: 1, 2. 1 for High Valence, 2 for Low Valence.
  1. Inference

    python main_cp.py --mode inference --task_type 4-cls --load_ckt CHECKPOINT_FOLDER --load_ckt_loss 25 --num_songs 10 --emo_tag 1 
    

Train the model by yourself

  1. Prepare the data follow the steps.

  2. training options:

  • exp_name: the folder name that the checkpoints will be saved.

  • data_parallel: use data_parallel to let the training process faster. (0: not use, 1: use)

  • task_type: the conditioning task:

    • '4-cls' for 4 class conditioning
    • 'Arousal' for only conditioning on arousal
    • 'Valence' for only conditioning on Valence
    • 'ignore' for not conditioning

    a. Only train on EMOPIA: (no-pretrained transformer in the paper)

      python main_cp.py --path_train_data emopia --exp_name YOUR_EXP_NAME --load_ckt none
    

    b. Pre-train the transformer on AILabs17k:

      python main_cp.py --path_train_data ailabs --exp_name YOUR_EXP_NAME --load_ckt none --task_type ignore
    

    c. fine-tune the transformer on EMOPIA: For example, you want to use the pre-trained model stored in 0309-1857 with loss= 30 to fine-tune:

      python main_cp.py --path_train_data emopia --exp_name YOUR_EXP_NAME --load_ckt 0309-1857 --load_ckt_loss 30
    

Baseline

  1. The baseline code is based on the work of Learning to Generate Music with Sentiment

  2. According to the author, the model works best when it is trained with 4096 neurons of LSTM, but takes 12 days for training. Therefore, due to the limit of computational resource, we used the size of 512 neurons instead of 4096.

  3. In order to use this as evaluation against our model, the target emotion classes is expanded to 4Q instead of just positive/negative.

Authors

The paper is a co-working project with Joann, SeungHeon and Nabin. This repository is mentained by Joann and me.

License

The EMOPIA dataset is released under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). It is provided primarily for research purposes and is prohibited to be used for commercial purposes. When sharing your result based on EMOPIA, any act that defames the original music owner is strictly prohibited.

The hand drawn piano in the logo comes from Adobe stock. The author is Burak. I purchased it under standard license.

Cite the dataset

@inproceedings{{EMOPIA},
         author = {Hung, Hsiao-Tzu and Ching, Joann and Doh, Seungheon and Kim, Nabin and Nam, Juhan and Yang, Yi-Hsuan},
         title = {{MOPIA}: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation},
         booktitle = {Proc. Int. Society for Music Information Retrieval Conf.},
         year = {2021}
}
Owner
hung anna
hung anna
Oriented Object Detection: Oriented RepPoints + Swin Transformer/ReResNet

Oriented RepPoints for Aerial Object Detection The code for the implementation of “Oriented RepPoints + Swin Transformer/ReResNet”. Introduction Based

96 Dec 13, 2022
Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech

Meta-TTS: Meta-Learning for Few-shot SpeakerAdaptive Text-to-Speech This repository is the official implementation of "Meta-TTS: Meta-Learning for Few

Sung-Feng Huang 128 Dec 25, 2022
VISNOTATE: An Opensource tool for Gaze-based Annotation of WSI Data

VISNOTATE: An Opensource tool for Gaze-based Annotation of WSI Data Introduction Requirements Installation and Setup Supported Hardware and Software R

SigmaLab 1 Jun 14, 2022
PyTorch - Python + Nim

Master Release Pytorch - Py + Nim A Nim frontend for pytorch, aiming to be mostly auto-generated and internally using ATen. Because Nim compiles to C+

Giovanni Petrantoni 425 Dec 22, 2022
Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes, ICCV 2017

AdaptationSeg This is the Python reference implementation of AdaptionSeg proposed in "Curriculum Domain Adaptation for Semantic Segmentation of Urban

Yang Zhang 128 Oct 19, 2022
ICNet and PSPNet-50 in Tensorflow for real-time semantic segmentation

Real-Time Semantic Segmentation in TensorFlow Perform pixel-wise semantic segmentation on high-resolution images in real-time with Image Cascade Netwo

Oles Andrienko 219 Nov 21, 2022
Pytorch implementation of the paper Progressive Growing of Points with Tree-structured Generators (BMVC 2021)

PGpoints Pytorch implementation of the paper Progressive Growing of Points with Tree-structured Generators (BMVC 2021) Hyeontae Son, Young Min Kim Pre

Hyeontae Son 9 Jun 06, 2022
Generalized Decision Transformer for Offline Hindsight Information Matching

Generalized Decision Transformer for Offline Hindsight Information Matching [arxiv] If you use this codebase for your research, please cite the paper:

Hiroki Furuta 35 Dec 12, 2022
GluonMM is a library of transformer models for computer vision and multi-modality research

GluonMM is a library of transformer models for computer vision and multi-modality research. It contains reference implementations of widely adopted baseline models and also research work from Amazon

42 Dec 02, 2022
BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond

BasicVSR BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond Ported from https://github.com/xinntao/BasicSR Dependencie

Holy Wu 8 Jun 07, 2022
IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling

IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling This is my code, data and approach for the IEEE-CIS Technical Challen

3 Sep 18, 2022
Image morphing without reference points by applying warp maps and optimizing over them.

Differentiable Morphing Image morphing without reference points by applying warp maps and optimizing over them. Differentiable Morphing is machine lea

Alex K 380 Dec 19, 2022
Narya API allows you track soccer player from camera inputs, and evaluate them with an Expected Discounted Goal (EDG) Agent

Narya The Narya API allows you track soccer player from camera inputs, and evaluate them with an Expected Discounted Goal (EDG) Agent. This repository

Paul Garnier 121 Dec 30, 2022
RipsNet: a general architecture for fast and robust estimation of the persistent homology of point clouds

RipsNet: a general architecture for fast and robust estimation of the persistent homology of point clouds This repository contains the code asscoiated

Felix Hensel 14 Dec 12, 2022
Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles This project is for the paper: Detecting Errors and Estimating

Jiefeng Chen 13 Nov 21, 2022
Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs

Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs This repository contains code to accompany the paper "Hierarchical Clustering: O

3 Sep 25, 2022
PyTorch implementation of Off-policy Learning in Two-stage Recommender Systems

Off-Policy-2-Stage This repo provides a PyTorch implementation of the MovieLens experiments for the following paper: Off-policy Learning in Two-stage

Jiaqi Ma 25 Dec 12, 2022
pytorch implementation of ABC : Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning

ABC:Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised Learning, NeurIPS 2021 pytorch implementation of ABC : Auxiliary Balanced Class

Hyuck Lee 25 Dec 22, 2022
ParmeSan: Sanitizer-guided Greybox Fuzzing

ParmeSan: Sanitizer-guided Greybox Fuzzing ParmeSan is a sanitizer-guided greybox fuzzer based on Angora. Published Work USENIX Security 2020: ParmeSa

VUSec 158 Dec 31, 2022
A PyTorch library for Vision Transformers

VFormer A PyTorch library for Vision Transformers Getting Started Read the contributing guidelines in CONTRIBUTING.rst to learn how to start contribut

Society for Artificial Intelligence and Deep Learning 142 Nov 28, 2022