A PyTorch Implementation of End-to-End Models for Speech-to-Text

Last update: Dec 25, 2022

Related tags

Overview

speech

Speech is an open-source package to build end-to-end models for automatic speech recognition. Sequence-to-sequence models with attention, Connectionist Temporal Classification and the RNN Sequence Transducer are currently supported.

The goal of this software is to facilitate research in end-to-end models for speech recognition. The models are implemented in PyTorch.

The software has only been tested in Python3.6.

We will not be providing backward compatability for Python2.7.

Install

We recommend creating a virtual environment and installing the python requirements there.

virtualenv <path_to_your_env>
source <path_to_your_env>/bin/activate
pip install -r requirements.txt

Then follow the installation instructions for a version of PyTorch which works for your machine.

After all the python requirements are installed, from the top level directory, run:

make

The build process requires CMake as well as Make.

After that, source the setup.sh from the repo root.

source setup.sh

Consider adding this to your bashrc.

You can verify the install was successful by running the tests from the tests directory.

cd tests
pytest

Run

To train a model run

python train.py <path_to_config>

After the model is done training you can evaluate it with

python eval.py <path_to_model> <path_to_data_json>

To see the available options for each script use -h:

python {train, eval}.py -h

Examples

For examples of model configurations and datasets, visit the examples directory. Each example dataset should have instructions and/or scripts for downloading and preparing the data. There should also be one or more model configurations available. The results for each configuration will documented in each examples corresponding README.md.

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Related tags

Overview

speech

Install

Run

Examples

Owner

Awni Hannun

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Maix Speech AI lib, including ASR, chat, TTS etc.

Experiments in converting wikidata to ftm

Easy to start. Use deep nerual network to predict the sentiment of movie review.

Black for Python docstrings and reStructuredText (rst).

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

Open-World Entity Segmentation

Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

A Multi-modal Model Chinese Spell Checker Released on ACL2021.

Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

Various Algorithms for Short Text Mining

A CRM department in a local bank works on classify their lost customers with their past datas. So they want predict with these method that average loss balance and passive duration for future.

This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Combating Embedding Barrier in Multilingual Models for Low-Resource Language Understanding".

ZUNIT - Toward Zero-Shot Unsupervised Image-to-Image Translation

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form