HF's ML for Audio study group

Last update: Jan 01, 2023

Overview

Hugging Face Machine Learning for Audio Study Group

Welcome to the ML for Audio Study Group. Through a series of presentations, paper reading and discussions, we'll explore the field of applying Machine Learning in the Audio domain. Some examples of this are:

Generating synthetic sound out of a given text (think of conversational assistants)
Transcribing audio signals to text.
Removing noise out of an audio.
Separating different sources of audio.
Identifying which speaker is talking.
And much more!

We suggest you to join the community Discord at http://hf.co/join/discord, and we're looking forward to meet at the #ml-4-audio-study-group channel 🤗 . Remember, this is a community effort so make out of this your space!

Organisation

We'll kick off with some basics and then collaboratively decide the further direction of the group.

Before each session:

Read/watch related resources

During each session, you can

Ask question in the forum
Present a short (~10-15mins) presentation on the topic (agree beforehand)

Before/after:

Keep discussing/asking questions about the topic (#ml-4-audio-study channel on discord)
Share interesting resources

Schedule

Date	Topics	Resources (To read before)
Dec 14, 2021	Kickoff + Overview of Audio related usecases (video, questions)	The 3 DL Frameworks for e2e Speech Recognition that power your devices
Dec 21, 2021	Intro to Audio Automatic Speech Recognition Deep Dive (video, questions)	Intro to Audio for FastAI Sections 1 and 2 Speech and Language Processing 26.1-26.5
Jan 4, 2022	Text to Speech Deep Dive (video, questions)	Intro to Audio & ASR Notebooks Speech and Language Processing 26.6
Jan 18, 2022	pyctcdecode: A simple & fast STT prediction decoding algorithm (demo, slides, questions)	Beam search CTC decoding pyctcdecode

Supplementary Resources

In case you want to solidify a concept, or just want to go down further deep into the speech processing rabbit-hole.

General Resources

Slides from LSA352: Slides (no videos available)
Slides from CS224S (Latest): Slides (no videos available)
Speech & Language Processing Book (Chapters 25 & 26) - E-book

Research Papers

Speech Recognition Papers: Github repo
Speech Synthesis Papers: Github repo

Toolkits

Speechbrain - Github repo
Toucan - Github repo
ESPnet - Github repo

Demos

Add interesting effects to your audio files - Huggingface spaces
Generate Speech from text (TTS) - Huggingface spaces
Generate text from Speech (ASR) - Huggingface spaces

HF's ML for Audio study group

Related tags

Overview

Hugging Face Machine Learning for Audio Study Group

Organisation

Schedule

Supplementary Resources

General Resources

Research Papers

Toolkits

Demos

Owner

Vaibhav Srivastav

An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

⚖️ A Statutory Article Retrieval Dataset in French.

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

Text Classification in Turkish Texts with Bert

Telegram AI chat bot written in Python using Pyrogram

Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

A BERT-based reverse-dictionary of Korean proverbs

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)

Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"

Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

端到端的长本文摘要模型（法研杯2020司法摘要赛道）

Nested Named Entity Recognition for Chinese Biomedical Text

Natural Language Processing

A repo for materials relating to the tutorial of CS-332 NLP