A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

Last update: Oct 23, 2022

Related tags

Overview

wav2vec-toolkit

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

This repository accompanies the 🤗 HuggingFace Community Paper on finetuning Wav2Vec2 XLSR for low-resource languages [link]

How to contribute

(Mostly identical to the huggingface/datasets contributing guide)

Fork the repository by clicking on the 'Fork' button on the repository's page. This creates a copy of the code under your GitHub user account.

Clone your fork to your local disk, and add the base repository as a remote:

git clone [email protected]:<your Github handle>/wav2vec-toolkit.git
cd wav2vec-toolkit
git remote add upstream https://github.com/anton-l/wav2vec-toolkit.git

Create a new branch to hold your development changes:
```
git checkout -b a-descriptive-name-for-my-changes
```
do not work on the master branch.
Set up a development environment by running the following command in a virtual environment:
```
pip install -e ".[dev]"
```
(If wav2vec-toolkit was already installed in the virtual environment, remove it with pip uninstall wav2vec_toolkit before reinstalling it in editable mode with the -e flag.)
Develop the features on your branch.
Format your code. Run black and isort so that your newly added files look nice with the following command:
```
black --line-length 119 --target-version py36 src scripts
isort src scripts
```
Once you're happy with your implementation, add your changes and make a commit to record your changes locally:
```
git add .
git commit
```
It is a good idea to sync your copy of the code with the original repository regularly. This way you can quickly account for changes:
```
git fetch upstream
git rebase upstream/main
```
Push the changes to your account using:
```
git push -u origin a-descriptive-name-for-my-changes
```
Once you are satisfied, go the webpage of your fork on GitHub. Click on "Pull request" to send your to the project maintainers for review.

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

Related tags

Overview

wav2vec-toolkit

How to contribute

Owner

Anton Lozhkov

Associated Repository for "Translation between Molecules and Natural Language"

Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

The official implementation of "BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?, ACL 2021 main conference"

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Mlcode - Continuous ML API Integrations

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

Conversational text Analysis using various NLP techniques

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

Arabic-Phonetic-Output - You can input the phonetic version of any Arabic text here. This software will show you output in Arabic (with vowels)

Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

Chatbot with Pytorch, Python & Nextjs

CCKS-Title-based-large-scale-commodity-entity-retrieval-top1

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.