SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Last update: Nov 20, 2021

Related tags

Text Data & NLP SASE

Overview

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

We propose a SASE model with adaptive noise distribution, which achieves state of the art results on the VioceBank+DEMAND dataset.
We simulated the federated learning setting of a real environment and verified the robustness of the proposed SASE noise reduction model in a real environment through experiments and visualization.
The proposed SASE model is computed based on the complex domain, and the TF-GA block is used to extract richer information of speech distribution and noise distribution, while SA-GOEA and SA-GUEA are adaptive to learn the distribution mask of noise.
In this paper, we propose a model aggregation optimization weighting strategy that is more applicable to FLbased speech enhancement tasks.

Dependencies

python >=3.6 (3.8.5 was used in the experiments)
PyTorch == 1.10.0+cu113
flwr == 2.0.1

How to run the code

1. Prepare data

VoiceBank+DEMAND can be accessed from this [link](## SUPERSEDED: THIS DATASET HAS BEEN REPLACED. ## Noisy speech database for training speech enhancement algorithms and TTS models)
CommonVoice(Chinese) link +Noise92 [link](NOISEX (cmu.edu))

2. Train on the VoiceBank+DEMAND dataset

python main.py

3. Train on the CommonVoice(Chinese)+Noise92 dataset with Federated learning

./run-server.sh
./run-client.sh
- You can change the number of clients by changing NUM_CLIENTS

4. Generate wav files and evaluate

python main.py -g --resume "model_file" -df "wavs_root"

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Related tags

Overview

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Dependencies

How to run the code

1. Prepare data

2. Train on the VoiceBank+DEMAND dataset

3. Train on the CommonVoice(Chinese)+Noise92 dataset with Federated learning

4. Generate wav files and evaluate

Result

1. Evaluate on VoiceBank+DEMAND dataset

2. Evaluate on CommonVoice+Noise92 dataset

Owner

Tower

Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

Kestrel Threat Hunting Language

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

State of the art faster Natural Language Processing in Tensorflow 2.0 .

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

DeepPavlov Tutorials

🗣️ NALP is a library that covers Natural Adversarial Language Processing.

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Synthetic data for the people.

VMD Audio/Text control with natural language

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Download videos from YouTube/Twitch/Twitter right in the Windows Explorer, without installing any shady shareware apps

Natural Language Processing Specialization

Convolutional Neural Networks for Sentence Classification

A framework for implementing federated learning

Awesome-NLP-Research (ANLP)

Guide to using pre-trained large language models of source code

Yet Another Neural Machine Translation Toolkit

Train 🤗-transformers model with Poutyne.

A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository