Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Last update: Dec 05, 2022

Overview

Text2Music Emotion Embedding

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Reference

Emotion Embedding Spaces for Matching Music to Stories, ISMIR 2021 [paper]

-- Minz Won, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore, and Xavier Serra

@inproceedings{won2021emotion,
  title={Emotion embedding spaces for matching music to stories},
  author={Won, Minz. and Salamon, Justin. and Bryan, Nicholas J. and Mysore, Gautham J. and Serra, Xavier.},
  booktitle={ISMIR},
  year={2021}
}

Requirements

conda create -n YOUR_ENV_NAME python=3.7
conda activate YOUR_ENV_NAME
pip install -r requirements.txt

Data

You need to collect audio files of AudioSet mood subset (link).
Read the audio files and store them into .npy format.
Other relevant data including Alm's dataset (original link), ISEAR dataset (original link), emotion embeddings, pretrained Word2Vec, and data splits are all available here (link).
Unzip ttm_data.tar.gz and locate the extracted data folder under text2music-emotion-embedding/.

Training

Here is an example for training a metric learning model.

python3 src/metric_learning/main.py \
        --dataset 'isear' \
        --num_branches 3 \
        --data_path YOUR_DATA_PATH_TO_AUDIOSET

Fore more examples, check bash files under scripts folder.

Test

Here is an example for the test.

python3 src/metric_learning/main.py \
        --mode 'TEST' \
        --dataset 'alm' \
        --model_load_path 'data/pretrained/alm_cross.ckpt' \
        --data_path 'YOUR_DATA_PATH_TO_AUDIOSET'

Pretrained three-branch metric learning models (alm_cross.ckpt and isear_cross.ckpt) are included in ttm_data.tar.gz. This code is reproducible by locating the unzipped data folder under text2music-emotion-embedding/.

Visualization

Embedding distribution of each model can be projected onto 2-dimensional space. We used uniform manifold approximation and projection (UMAP) to visualize the distribution. UMAP is known to preserve more of global structure compared to t-SNE.

Demo

Please try some examples done by the three-branch metric learning model [Soundcloud].

License

Some License

Text-to-Music Retrieval using Pre-defined/Data-driven Emotion Embeddings

Related tags

Overview

Text2Music Emotion Embedding

Reference

Requirements

Data

Training

Test

Visualization

Demo

License

Owner

Minz Won

TensorFlow Ranking is a library for Learning-to-Rank (LTR) techniques on the TensorFlow platform

Shitty gaze mouse controller

PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation.

A Protein-RNA Interface Predictor Based on Semantics of Sequences

Neural Fixed-Point Acceleration for Convex Optimization

Scheme for training and applying a label propagation framework

EZ graph is an easy to use AI solution that allows you to make and train your neural networks without a single line of code.

This project uses Template Matching technique for object detecting by detection of template image over base image.

基于深度强化学习的原神自动钓鱼AI

This is official implementaion of paper "Token Shift Transformer for Video Classification".

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

Self-Correcting Quantum Many-Body Control using Reinforcement Learning with Tensor Networks

Adversarial Learning for Semi-supervised Semantic Segmentation, BMVC 2018

Wind Speed Prediction using LSTMs in PyTorch

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Leibniz is a python package which provide facilities to express learnable partial differential equations with PyTorch

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

Code for ICLR 2021 Paper, "Anytime Sampling for Autoregressive Models via Ordered Autoencoding"