wav2vec_finetune

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

Initial test: gender recognition on this dataset.
Finetune for autism detection
[] Clean up directory
[] Make training and evaluation scripts runnable with cmd line / shell scripts
[] Add random noise on training samples
[] Make baseline models

# make virtual env
pip install -r requirements.txt

mkdir data
mkdir preproc_data
mkdir model
cd data
wget https://zenodo.org/record/1219621/files/CaFE_48k.zip?download=1
unzip the file 

python preproc.py
python train.py
python evaluate.py

Updates

11/9: success! Trained a sex classifier on a small dataset that performs soso. Everything seems to work though.

TODO

Chunk audio files - make predictions in batches of e.g. 5 seconds
Set up benchmark models

Resources:

https://github.com/pytorch/fairseq/blob/master/examples/xlmr/README.md
https://arxiv.org/abs/2006.13979
https://huggingface.co/transformers/training.html
https://huggingface.co/blog/fine-tune-xlsr-wav2vec2
https://discuss.huggingface.co/t/german-asr-fine-tuning-wav2vec2/4558/5
https://huggingface.co/docs/datasets/loading_datasets.html#from-local-files
https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/FINE_TUNE_XLSR_WAV2VEC2.md
https://github.com/m3hrdadfi/soxan
https://www.zhaw.ch/storage/engineering/institute-zentren/cai/BA21_Speech_Classification_Reiser_Fivian.pdf
https://github.com/DReiser7/w2v_did
https://github.com/ARBML/klaam
https://github.com/talhanai/speech-nlp-datasets

Notes:

Look into SpecAugment for finetuning: https://arxiv.org/abs/1904.08779 (on by default)
How to make the prediction?
- Better way than a small feedforward projection? (LSTM or something?)

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

Related tags

Overview

wav2vec_finetune

Updates

TODO

Resources:

Notes:

Owner

Code for the project carried out fulfilling the course requirements for Fall 2021 NLP at NYU

Fake Shakespearean Text Generator

A fast and lightweight python-based CTC beam search decoder for speech recognition.

Textlesslib - Library for Textless Spoken Language Processing

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

Example code for "Real-World Natural Language Processing"

Protein Language Model

用Resnet101+GPT搭建一个玩王者荣耀的AI

Code for the paper "Flexible Generation of Natural Language Deductions"

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

Neural network sequence labeling model

MMDA - multimodal document analysis

Collection of useful (to me) python scripts for interacting with napari

This repository collects together basic linguistic processing data for using dataset dumps from the Common Voice project

Code for Text Prior Guided Scene Text Image Super-Resolution

Contains descriptions and code of the mini-projects developed in various programming languages

Simple and efficient RevNet-Library with DeepSpeed support

A python package to fine-tune transformer-based models for named entity recognition (NER).

xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.