Official implementation of A cappella: Audio-visual Singing VoiceSeparation, from BMVC21

Overview

Y-Net

Official implementation of A cappella: Audio-visual Singing VoiceSeparation, British Machine Vision Conference 2021

Project page: ipcv.github.io/Acappella/
Paper: Arxiv, BMVC (not available yet)

Running a demo / Y-Net Inference

We provide simple functions to load models with pre-trained weights. Steps:

  1. Clone the repo or download y-net>VnBSS>models (models can run as a standalone package)
  2. Load a model:
from VnBSS import y_net_gr # or from models import y_net_gr 
model = y_net_gr(n=1)

Check a demo fully working:
Open In Colab

Citation

@inproceedings{acappella,
    author    = {Juan F. Montesinos and
                 Venkatesh S. Kadandale and
                 Gloria Haro},
    title     = {A cappella: Audio-visual Singing VoiceSeparation},
    booktitle = {British Machine Vision Conference (BMVC)},
    year      = {2021},

}

Repository under construction .
.
.
.
.
.
.
.

Training / Using DEV code

###Training The most difficult part is to prepare the dataset as everything is builded upon a very specific format.
To run training:
python run.py -m model_name --workname experiment_name --arxiv_path directory_of_experiments --pretrained_from path_pret_weights
You can inspect the argparse at default.py>argparse_default.
Possible model names are: y_net_g, y_net_gr, y_net_m,y_net_r,u_net,llcp

Testing

  1. Go to manuscript_scripts and replace checkpoint paths by yours in the testing scripts.
  2. Run: bash manuscript_scripts/test_gr_r.sh
  3. Replace the paths of manuscript_scripts/auto_metrics.py by your experiment_directory path.
  4. Run: python manuscript_scripts/auto_metrics.py to visualise results.

It's a complicated framework. HELP!

The best option to run the framework is to debug! Having a runable code helps to see input shapes, dataflow and to run line by line. Download The circle of life demo with the files already processed. It will act like a dataset of 6 samples. You can download it from Google Drive 1.1 Gb.

  1. Unzip the file
  2. run python run.py -m y_net_gr (for example)

Everything has been configured to run by default this way.

The model

Each effective model is wrapped by a nn.Module which takes care of computing the STFT, the mask, returning the waveform etcetera... This wrapper can be found at VnBSS>models>y_net.py>YNet. To get rid of this you can simply inherit the class, take minimum layers and keep the core_forward method, which is the inference step without the miscelanea.

FAQs

  1. How to change the optimizer's hyperparameters?
    Go to config>optimizer.json
  2. How to change clip duration, video framerate, STFT parameters or audio samplerate?
    Go to config>__init__.py
  3. How to change the batch size or the amount of epochs?
    Go to config>hyptrs.json
  4. How to dump predictions from the training and test set
    Go to default.py. Modify DUMP_FILES (can be controlled at a subset level). force argument skips the iteration-wise conditions and dumps for every single network prediction.
  5. Is tensorboard enabled?
    Yes, you will find tensorboard records at your_experiment_directory/used_workname/tensorboard
  6. Can I resume an experiment?
    Yes, if you set exactly the same experiment folder and workname, the system will detect it and will resume from there.
  7. I'm trying to resume but found AssertionError If there is an exception before running the model
  8. How to change the amount of layers of U-Net
    U-net is build dynamically given a list of layers per block as shown in models>__init__.py from outer to inner blocks.
  9. How to modify the default network values?
    The json file config>net_cfg.json overwrites any default configuration from the model.
Owner
Juan F. Montesinos
PhD student at Pompeu Fabra university Barcelona
Juan F. Montesinos
Codes for "Efficient Long-Range Attention Network for Image Super-resolution"

ELAN Codes for "Efficient Long-Range Attention Network for Image Super-resolution", arxiv link. Dependencies & Installation Please refer to the follow

xindong zhang 124 Dec 22, 2022
Spotipy - Player de música simples em Python

Spotipy Player de música simples em Python, utilizando a biblioteca Pysimplegui para a interface gráfica. Este tocador é bastante simples em si, mas p

Adelino Almeida 4 Feb 28, 2022
digital audio workstation, instrument and effect plugins, wave editor

digital audio workstation, instrument and effect plugins, wave editor

306 Jan 05, 2023
無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXのコア

無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXのコア

Hiroshiba 0 Aug 29, 2022
A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

A simple python script to play bell sound in your system infinitely, just for fun and experimental purposes

نافع الهلالي 1 Oct 29, 2021
Open-Source Tools & Data for Music Source Separation: A Pragmatic Guide for the MIR Practitioner

Open-Source Tools & Data for Music Source Separation: A Pragmatic Guide for the MIR Practitioner

IELab@ Korea University 0 Nov 12, 2021
Port Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. / 筆墨クミDeepvocal中文音源

Hitsuboku Kumi (筆墨クミ) is a UTAU virtual singer developed by Cubialpha. This project ports Hitsuboku Kumi Chinese CVVC voicebank to deepvocal. This is the first open-source deepvocal voicebank on Gith

8 Apr 26, 2022
A python package for calculating the PESQ.

PyPESQ (WIP) Pypesq is a python wrapper for the PESQ score calculation C routine. It only can be used in evaluation purpose. INSTALL pip install https

Jingdong Li 269 Dec 18, 2022
Bot Music Pintar. Created by Rio

🎶 Rio Music 🎶 Kalo Fork Star Ya Bang Hehehe Requirements 📝 FFmpeg NodeJS nodesource.com Python 3.8+ or 3.7 PyTgCalls Generate String Using Replit ⤵

RioProjectX 7 Jun 15, 2022
Noinoi music is smoothly playing music on voice chat of telegram.

NOINOI MUSIC BOT ✨ Features Music & Video stream support MultiChat support Playlist & Queue support Skip, Pause, Resume, Stop feature Music & Video do

2 Feb 13, 2022
An 8D music player made to enjoy Halloween this year!🤘

HAPPY HALLOWEEN buddy! Split Player Hello There! Welcome to SplitPlayer... Supposed To Be A 8DPlayer.... You Decide.... It can play the ordinary audio

Akshat Kumar Singh 1 Nov 04, 2021
Analysis of voices based on the Mel-frequency band

Speaker_partition_module Analysis of voices based on the Mel-frequency band. Goal: Identification of voices speaking (diarization) and calculation of

1 Feb 06, 2022
controls volume using hand gestures

controls volume using hand gestures

1 Oct 11, 2021
Praat in Python, the Pythonic way

Parselmouth - Praat in Python, the Pythonic way Parselmouth is a Python library for the Praat software. Though other attempts have been made at portin

Yannick Jadoul 786 Jan 09, 2023
python wrapper for rubberband

pyrubberband A python wrapper for rubberband. For now, this just provides lightweight wrappers for pitch-shifting and time-stretching. All processing

Brian McFee 106 Nov 28, 2022
Stevan KZ 1 Oct 27, 2021
Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

MediumVC MediumVC is an utterance-level method towards any-to-any VC. Before that, we propose SingleVC to perform A2O tasks(Xi → Ŷi) , Xi means utter

谷下雨 47 Dec 25, 2022
MusicBrainz Picard

MusicBrainz Picard MusicBrainz Picard is a cross-platform (Linux/Mac OS X/Windows) application written in Python and is the official MusicBrainz tagge

MetaBrainz Foundation 3k Dec 31, 2022
Algorithmic Multi-Instrumental MIDI Continuation Implementation

Matchmaker Algorithmic Multi-Instrumental MIDI Continuation Implementation Taming large-scale MIDI datasets with algorithms This is a WIP so please ch

Alex 2 Mar 11, 2022
A collection of free MIDI chords and progressions ready to be used in your DAW, Akai MPC, or Roland MC-707/101

A collection of free MIDI chords and progressions ready to be used in your DAW, Akai MPC, or Roland MC-707/101

921 Jan 05, 2023