Romanian Automatic Speech Recognition from the ROBIN project

Last update: Jan 01, 2023

Overview

RobinASR

This repository contains Robin's Automatic Speech Recognition (RobinASR) for the Romanian language based on the DeepSpeech2 architecture, together with a KenLM language model to imporve the transcriptions.

The pretrained text-to-speech model can be downloaded from here and the pretrained KenLM can be downloaded from here.

Also, make sure to visit:

A demo of the ASR system available in the RELATE platform: https://relate.racai.ro/index.php?path=robin/asr
A post-processing web service allowing hyphenation and basic capitalization restoration: https://github.com/racai-ai/RobinASRHyphenationCorrection

Installation

Docker

Download the pretrained text-to-speech model and the pretrained KenLM at the above links, and copy them in a models directory inside this repository.
Build the docker image using the Dockerfile. Make sure that deepspeech_pytorch/configs/inference_config.py has the desired configuration.

docker build --tag RobinASR .

Run the docker image.

docker run --gpus all -p 8888:8888 --net=host --ipc=host RobinASR

From Source

You must have Python 3.6+ and PyTorch 1.5.1+ installed in your system. Also. Cuda 10.1+ is required if you want to use the (recommended) GPU version.
Clone the repository and install its dependencies:

git clone https://github.com/racai-ai/RobinASR.git
cd RobinASR
pip3 install -r requirements.txt
pip3 install -e .

Install Nvidia Apex:

git clone --recursive https://github.com/NVIDIA/apex.git
cd apex && pip install .

If you want to use Beam Search and the KenLM language model, you must install CTCDecode:

git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode && pip install .

Inference Server

Firstly, take a look at the configuration file in deepspeech_pytorch/configs/inference_config.py and make sure that the configuration meets your requirements. Then, run the following command:

python3 server.py

Train a New Model

You must create 3 csv manifest files (train, valid and test) that contain on each line the the path to a wav file and the path to its corresponding transcription, separated by commas:

path_to_wav1,path_to_txt1
path_to_wav2,path_to_txt2
path_to_wav3,path_to_txt3
...

Then you must modify correspondingly with your configuration the file located at deepspeech_pytorch/configs/train_config.py and start training with:

python train.py

Acknowledgments

We would like to thank Sean Narnen for making his DeepSpeech2 implementation publicly-available. We used a lot of his code in our implementation.

Cite

If you are using this repository, please cite the following paper as a thank you to the authors:

Avram, A.M., Păiș, V. and Tufis, D., 2020, October. Towards a Romanian end-to-end automatic speech recognition based on Deepspeech2. In Proc. Rom. Acad. Ser. A (Vol. 21, pp. 395-402).

or in BibTeX format:

@inproceedings{avram2020towards,
  title={Towards a Romanian end-to-end automatic speech recognition based on Deepspeech2},
  author={Avram, Andrei-Marius and Păiș, Vasile and Tufiș, Dan},
  booktitle={Proceedings of the Romanian Academy, Series A},
  pages={395--402},
  year={2020}
}

Romanian Automatic Speech Recognition from the ROBIN project

Related tags

Overview

RobinASR

Installation

Docker

From Source

Inference Server

Train a New Model

Acknowledgments

Cite

Owner

RACAI

The PyTorch implementation of DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision.

Unofficial Alias-Free GAN implementation. Based on rosinality's version with expanded training and inference options.

On the model-based stochastic value gradient for continuous reinforcement learning

Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

MvtecAD unsupervised Anomaly Detection

A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

The 2nd place solution of 2021 google landmark retrieval on kaggle.

Dataset Condensation with Contrastive Signals

C3D is a modified version of BVLC caffe to support 3D ConvNets.

Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)

Source code and data in paper "MDFEND: Multi-domain Fake News Detection (CIKM'21)"

FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021.

A repository for interferometer controller code.

TSIT: A Simple and Versatile Framework for Image-to-Image Translation

Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5)

Underwater image enhancement

Code for: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.