Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Oral)

Overview

CMT

Code for paper Video Background Music Generation with Controllable Music Transformer (ACM MM 2021 Best Paper Award)

[Paper] [Site]

Directory Structure

  • src/: code of the whole pipeline

    • train.py: training script, take a npz as input music data to train the model

    • model.py: code of the model

    • gen_midi_conditional.py: inference script, take a npz (represents a video) as input to generate several songs

    • src/video2npz/: convert video into npz by extracting motion saliency and motion speed

  • dataset/: processed dataset for training, in the format of npz

  • logs/: logs that automatically generate during training, can be used to track training process

  • exp/: checkpoints, named after val loss (e.g. loss_13_params.pt)

  • inference/: processed video for inference (.npz), and generated music(.mid)

Preparation

  • clone this repo

  • download lpd_5_prcem_mix_v8_10000.npz from HERE and put it under dataset/

  • download pretrained model loss_8_params.pt from HERE and put it under exp/

  • install ffmpeg=3.2.4

  • prepare a Python3 conda environment

    pip install -r py3_requirements.txt
  • prepare a Python2 conda environment (for extracting visbeat)

    • pip install -r py2_requirements.txt
    • open visbeat package directory (e.g. anaconda3/envs/XXXX/lib/python2.7/site-packages/visbeat), replace the original Video_CV.py with src/video2npz/Video_CV.py

Training

  • If you want to use another training set: convert training data from midi into npz under dataset/

    python midi2numpy_mix.py --midi_dir /PATH/TO/MIDIS/ --out_name data.npz 
  • train the model

    python train.py -n XXX -g 0 1 2 3
    
    # -n XXX: the name of the experiment, will be the name of the log file & the checkpoints directory. if XXX is 'debug', checkpoints will not be saved
    # -l (--lr): initial learning rate
    # -b (--batch_size): batch size
    # -p (--path): if used, load model checkpoint from the given path
    # -e (--epochs): number of epochs in training
    # -t (--train_data): path of the training data (.npz file) 
    # -g (--gpus): ids of gpu
    # other model hyperparameters: modify the source .py files

Inference

  • convert input video (MP4 format) into npz (use the Python2 environment)

    cd src/video2npz
    sh video2npz.sh ../../videos/xxx.mp4
    • try resizing the video if this takes a long time
  • run model to generate .mid :

    python gen_midi_conditional.py -f "../inference/xxx.npz" -c "../exp/loss_8_params.pt"
    
    # -c: checkpoints to be loaded
    # -f: input npz file
    # -g: id of gpu (only one gpu is needed for inference) 
    • if using another training set, change decoder_n_class in gen_midi_conditional to the decoder_n_class in train.py
  • convert midi into audio: use GarageBand (recommended) or midi2audio

    • set tempo to the value of tempo in video2npz/metadata.json
  • combine original video and audio into video with BGM

    ffmpeg -i 'xxx.mp4' -i 'yyy.mp3' -c:v copy -c:a aac -strict experimental -map 0:v:0 -map 1:a:0 'zzz.mp4'
    
    # xxx.mp4: input video
    # yyy.mp3: audio file generated in the previous step
    # zzz.mp4: output video
Owner
Zhaokai Wang
Undergraduate student from Beihang University
Zhaokai Wang
PyTorch implementation of DCT fast weight RNNs

DCT based fast weights This repository contains the official code for the paper: Training and Generating Neural Networks in Compressed Weight Space. T

Kazuki Irie 4 Dec 24, 2022
Simple reference implementation of GraphSAGE.

Reference PyTorch GraphSAGE Implementation Author: William L. Hamilton Basic reference PyTorch implementation of GraphSAGE. This reference implementat

William L Hamilton 861 Jan 06, 2023
Data visualization app for H&M competition in kaggle

handm_data_visualize_app Data visualization app by streamlit for H&M competition in kaggle. competition page: https://www.kaggle.com/competitions/h-an

Kyohei Uto 12 Apr 30, 2022
Measuring if attention is explanation with ROAR

NLP ROAR Interpretability Official code for: Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Toke

Andreas Madsen 19 Nov 13, 2022
Source code for the NeurIPS 2021 paper "On the Second-order Convergence Properties of Random Search Methods"

Second-order Convergence Properties of Random Search Methods This repository the paper "On the Second-order Convergence Properties of Random Search Me

Adamos Solomou 0 Nov 13, 2021
A Python implementation of global optimization with gaussian processes.

Bayesian Optimization Pure Python implementation of bayesian global optimization with gaussian processes. PyPI (pip): $ pip install bayesian-optimizat

fernando 6.5k Jan 02, 2023
This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

A Memory-saving Training Framework for Transformers This is the official PyTorch implementation for Mesa: A Memory-saving Training Framework for Trans

Zhuang AI Group 105 Dec 06, 2022
Implementation of "Bidirectional Projection Network for Cross Dimension Scene Understanding" CVPR 2021 (Oral)

Bidirectional Projection Network for Cross Dimension Scene Understanding CVPR 2021 (Oral) [ Project Webpage ] [ arXiv ] [ Video ] Existing segmentatio

Hu Wenbo 135 Dec 26, 2022
Instantaneous Motion Generation for Robots and Machines.

Ruckig Instantaneous Motion Generation for Robots and Machines. Ruckig generates trajectories on-the-fly, allowing robots and machines to react instan

Berscheid 374 Dec 23, 2022
Regularizing Generative Adversarial Networks under Limited Data (CVPR 2021)

Regularizing Generative Adversarial Networks under Limited Data [Project Page][Paper] Implementation for our GAN regularization method. The proposed r

Google 148 Nov 18, 2022
Certis - Certis, A High-Quality Backtesting Engine

Certis - Backtesting For y'all Certis is a powerful, lightweight, simple backtes

Yeachan-Heo 46 Oct 30, 2022
9th place solution in "Santa 2020 - The Candy Cane Contest"

Santa 2020 - The Candy Cane Contest My solution in this Kaggle competition "Santa 2020 - The Candy Cane Contest", 9th place. Basic Strategy In this co

toshi_k 22 Nov 26, 2021
Official PyTorch implementation of StyleGAN3

Modified StyleGAN3 Repo Changes Made tied to python 3.7 syntax .jpgs instead of .pngs for training sample seeds to recreate the 1024 training grid wit

Derrick Schultz (he/him) 83 Dec 15, 2022
HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

HiFi++ : a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement This is the unofficial implementation of Vocoder part of

Rishikesh (ऋषिकेश) 118 Dec 29, 2022
Semi-supevised Semantic Segmentation with High- and Low-level Consistency

Semi-supevised Semantic Segmentation with High- and Low-level Consistency This Pytorch repository contains the code for our work Semi-supervised Seman

123 Dec 30, 2022
Text mining project; Using distilBERT to predict authors in the classification task authorship attribution.

DistilBERT-Text-mining-authorship-attribution Dataset used: https://www.kaggle.com/azimulh/tweets-data-for-authorship-attribution-modelling/version/2

1 Jan 13, 2022
Semi-automated OpenVINO benchmark_app with variable parameters

Semi-automated OpenVINO benchmark_app with variable parameters. User can specify multiple options for any parameters in the benchmark_app and the progam runs the benchmark with all combinations of gi

Yasunori Shimura 8 Apr 11, 2022
Simple node deletion tool for onnx.

snd4onnx Simple node deletion tool for onnx. I only test very miscellaneous and limited patterns as a hobby. There are probably a large number of bugs

Katsuya Hyodo 6 May 15, 2022
Implementation for Learning to Track with Object Permanence

Learning to Track with Object Permanence A video-based MOT approach capable of tracking through full occlusions: Learning to Track with Object Permane

Toyota Research Institute - Machine Learning 91 Jan 03, 2023
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

ONNX Runtime is a cross-platform inference and training machine-learning accelerator. ONNX Runtime inference can enable faster customer experiences an

Microsoft 8k Jan 04, 2023