Binaural Speech Synthesis

Last update: Dec 18, 2022

Related tags

Overview

Binaural Speech Synthesis

This repository contains code to train a mono-to-binaural neural sound renderer. If you use this code or the provided dataset, please cite our paper "Neural Synthesis of Binaural Speech from Mono Audio",

@inproceedings{richard2021binaural,
  title={Neural Synthesis of Binaural Speech from Mono Audio},
  author={Richard, Alexander and Markovic, Dejan and Gebru, Israel D and Krenn, Steven and Butler, Gladstone and de la Torre, Fernando and Sheikh, Yaser},
  booktitle={International Conference on Learning Representations},
  year={2021}
}

Code

Detailed instructions how to use the code will be release prior to ICLR 2021.

Dataset

The dataset will be released prior to ICLR 2021.

License

The code and dataset are release under CC-NC 4.0 International license.

You might also like...

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

3.2k Dec 31, 2022

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

1k Dec 30, 2022

Simple Speech to Text, Text to Speech

Simple Speech to Text, Text to Speech 1. Download Repository Opsi 1 Download repository ini, extract di lokasi yang diinginkan Opsi 2 Jika sudah famil

5 Dec 28, 2021

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

1 Dec 20, 2021

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

29 Oct 16, 2022

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

537 Jan 5, 2023

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

103 Dec 23, 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism This repository is the official PyTorch implementation of our AAAI-2022 paper, in

829 Jan 7, 2023

PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

PhoNLP is a multi-task learning model for joint part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT for each task independently.

109 Dec 2, 2022

Comments

UserWarning: stft will soon require the return_complex parameter be given for real inputs

Hello,when I run the train.py, there is always a warning:

UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:639.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore

then nothing else.

Could you help me solve it?

opened by yijingshihenxiule 7
about the coordinate system and the pretrained network

Hi, thanks for you great work!

Since I am a beginner in and Audio and 3D, if you don't mind, I have some questions (that might be evident for you):

You said that

Receiver positions are therefore the same at all times. The tranmitter is the in the origin of the coordinate system and, from the receiver's perspective, x points forward, y points right, and z points up. <

I took a look at the dataset, I guess that rx_positions is the positions of the receiver and tx_positions is the positions of the sound transmitter. If the origin of the coordinate system is in the transmitter, then why rx_positions are all zeros in (x,y,z) ?

My another question is about the network, will you release the pretrained model? If not, can the provided training code produce similar outstanding results?

And how the network generalizes, like for example, what if I change the mono-audio and the positions during inference? I have monoaudio and 3d positions of my own but I cannot finetune the model because I dont have ground-truth binaural audio.

Thanks for your reply and again great work!

opened by yihongXU 0
Adding Code of Conduct file

This is pull request was created automatically because we noticed your project was missing a Code of Conduct file.

Code of Conduct files facilitate respectful and constructive communities by establishing expected behaviors for project contributors.

This PR was crafted with love by Facebook's Open Source Team.
CLA Signed

opened by facebook-github-bot 0
Adding Contributing file

This is pull request was created automatically because we noticed your project was missing a Contributing file.

CONTRIBUTING files explain how a developer can contribute to the project - which you should actively encourage.

This PR was crafted with love by Facebook's Open Source Team.
CLA Signed

opened by facebook-github-bot 0

Releases(video_v1.0)

video_v1.0(Jun 21, 2021)

This release contains silent top-view videos of each test sequence. You can overlay these videos with binaural audio generated by your model to generate top-view visualizations similar to those used in our supplemental video.
Source code(tar.gz)
Source code(zip)
subject1.mp4(1.93 MB)
subject2.mp4(1.81 MB)
subject3.mp4(1.96 MB)
subject4.mp4(1.93 MB)
subject5.mp4(2.01 MB)
subject6.mp4(1.87 MB)
subject7.mp4(1.90 MB)
subject8.mp4(1.72 MB)
validation.mp4(1.45 MB)
v1.1(Jun 21, 2021)
This release contains two pre-trained binaural networks:

a small model with a single WaveNet block for faster experiments;

a large model with three WaveNet blocks as in the ICLR paper.

Source code(tar.gz)
Source code(zip)
binaural_network_1block.net(11.05 MB)
binaural_network_3blocks.net(32.96 MB)
v1.0(Apr 30, 2021)

Download the binaural dataset here.
Source code(tar.gz)
Source code(zip)
binaural_dataset.zip(1235.99 MB)

Owner

Facebook Research

GitHub Repository

A complete NLP guideline for enthusiasts

NLP-NINJA A complete guide for Natural Language Processing in Python Table of Contents S.No. Topic Level Meaning 1 Tokenization 🤍 Beginner 2 Stemming

22 Dec 27, 2022

Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0

NLP-Models-Tensorflow, Gathers machine learning and tensorflow deep learning models for NLP problems, code simplify inside Jupyter Notebooks 100%. Tab

1.7k Dec 30, 2022

Blender addon - Scrub timeline from viewport with a shortcut

Viewport scrub timeline Move in the timeline directly in viewport and snap to nearest keyframe Note : This standalone feature will be added in the nat

40 Nov 07, 2022

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

GCRC GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Eva

5 Nov 04, 2022

Text Classification in Turkish Texts with Bert

You can watch the details of the project on my youtube channel Project Interface Project Second Interface Goal= Correctly guessing the classification

42 Dec 31, 2022

Text editor on python to convert english text to malayalam(Romanization/Transiteration).

Manglish Text Editor This is a simple transiteration (romanization ) program which is used to convert manglish to malayalam (converts njaan to ഞാൻ ).

1 May 11, 2022

Unsupervised text tokenizer focused on computational efficiency

YouTokenToMe YouTokenToMe is an unsupervised text tokenizer focused on computational efficiency. It currently implements fast Byte Pair Encoding (BPE)

847 Dec 19, 2022

100+ Chinese Word Vectors 上百种预训练中文词向量

Chinese Word Vectors 中文词向量中文 This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse),

10.4k Jan 09, 2023

Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"

Status: Archive (code is provided as-is, no updates expected) Update August 2020: For an example repository that achieves state-of-the-art modeling pe

1.3k Dec 28, 2022

The training code for the 4th place model at MDX 2021 leaderboard A.

32 Dec 18, 2022

Unsupervised Language Model Pre-training for French

FlauBERT and FLUE FlauBERT is a French BERT trained on a very large and heterogeneous French corpus. Models of different sizes are trained using the n

212 Dec 10, 2022

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

TRICE: a task-agnostic transferring framework for multi-source sequence generation This is the source code of our work Transfer Learning for Sequence

9 Jun 27, 2022

Korean extractive summarization. 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드

korean extractive summarization 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드 Leaderboard Notice Text Summarization with Pretrained Encoders에 나오는 bertsumext모델(ext

3 Aug 10, 2022

A 10000+ hours dataset for Chinese speech recognition

309 Dec 16, 2022

A combination of autoregressors and autoencoders using XLNet for sentiment analysis

A combination of autoregressors and autoencoders using XLNet for sentiment analysis Abstract In this paper sentiment analysis has been performed in or

2 Nov 20, 2021

Refactored version of FastSpeech2

Refactored version of FastSpeech2. An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

10 May 26, 2022

Tools, wrappers, etc... for data science with a concentration on text processing

Rosetta Tools for data science with a focus on text processing. Focuses on "medium data", i.e. data too big to fit into memory but too small to necess

207 Nov 22, 2022

Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

Wav2Vec2CTC With KenLM Using KenLM ARPA language model with beam search to decode audio files and show the most probable transcription. Assuming you'v

65 Sep 21, 2022

Gold standard corpus annotated with verb-preverb connections for Hungarian.

Hungarian Preverb Corpus A gold standard corpus manually annotated with verb-preverb connections for Hungarian. corpus The corpus consist of the follo

3 Jan 27, 2022

This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs".

CrossSum This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summ

29 Nov 19, 2022

Binaural Speech Synthesis

Related tags

Overview

Binaural Speech Synthesis

Code

Dataset

License

You might also like...

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Simple Speech to Text, Text to Speech

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022

PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing

Comments

UserWarning: stft will soon require the return_complex parameter be given for real inputs

about the coordinate system and the pretrained network

Adding Code of Conduct file

Adding Contributing file

Releases(video_v1.0)

video_v1.0(Jun 21, 2021)

v1.1(Jun 21, 2021)

v1.0(Apr 30, 2021)

Owner

Facebook Research

A complete NLP guideline for enthusiasts

Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0

Blender addon - Scrub timeline from viewport with a shortcut

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

Text Classification in Turkish Texts with Bert

Text editor on python to convert english text to malayalam(Romanization/Transiteration).

Unsupervised text tokenizer focused on computational efficiency

100+ Chinese Word Vectors 上百种预训练中文词向量

Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"

The training code for the 4th place model at MDX 2021 leaderboard A.

Unsupervised Language Model Pre-training for French

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

Korean extractive summarization. 2021 AI 텍스트 요약 온라인 해커톤 화성갈끄니까팀 코드

A 10000+ hours dataset for Chinese speech recognition

A combination of autoregressors and autoencoders using XLNet for sentiment analysis

Refactored version of FastSpeech2

Tools, wrappers, etc... for data science with a concentration on text processing

Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

Gold standard corpus annotated with verb-preverb connections for Hungarian.

This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs".