Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Last update: Sep 07, 2022

Overview

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

This repository is derived from the NMTGMinor project at https://github.com/quanpn90/NMTGMinor
The SVCCA calculation is derived from https://github.com/nlp-dke/svcca

Powered by Mediaan.com

Speech Translation (ST) is the task of translating speech audio in a source language into text in a target language. This repository implements and experiments on different approaches for ST:

Cascaded ST, including 2 steps: Automatic Speech Recognition (ASR) and Machine Translation (MT)
Direct ST: models trained only on ST data
(Main contribution) End-to-end ST limiting the use of ST data: multi-modal models leveraging ASR and MT training data for ST task

The Transformer architecture is used as the baseline for the implementation.

High-level instruction to use the repo:

Run covost_data_preparation.py to download and preprocess the data.
Run the shell script of interst, change the variables in the script if needed.
- run_translation_pipeline.sh for single-task models (ASR, MT, ST)
- cascaded_ST_evaluation.sh evaluates cascaded ST using pretrained ASR and MT models
- run_translation_multi_modalities_pipeline.sh for multi-task, multi-modality models (including zero-shot)
- run_zeroshot_with_artificial_data.sh for zero-shot models using data augmentation
- run_bidirectional_zeroshot.sh for zero-shot models using additional opposite training data
- run_fine_tunning.sh, run_fine_tunning_fromASR.sh for fine-tuning models with ST data, resulting in few-shot models
- modality_similarity_svcca.sh, modality_similarity_classifier.sh measure text-audio similarity in representation

See notebooks/Repo_Instruction.ipynb for more details.

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Related tags

Overview

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Owner

Tu Anh Dinh

A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

Distributed Asynchronous Hyperparameter Optimization in Python

DeepFashion2 is a comprehensive fashion dataset.

Deep Learning Tutorial for Kaggle Ultrasound Nerve Segmentation competition, using Keras

Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets.

GradAttack is a Python library for easy evaluation of privacy risks in public gradients in Federated Learning

[ICLR 2021] "Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective" by Wuyang Chen, Xinyu Gong, Zhangyang Wang

Additional code for Stable-baselines3 to load and upload models from the Hub.

Procedural 3D data generation pipeline for architecture

[NeurIPS-2020] Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.

Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

Part-Aware Data Augmentation for 3D Object Detection in Point Cloud

a curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.

Julia and Matlab codes to simulated all problems in El-Hachem, McCue and Simpson (2021)

Learning to Prompt for Continual Learning

AdaFocus (ICCV 2021) Adaptive Focus for Efficient Video Recognition

Code for Temporally Abstract Partial Models

[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

Official Pytorch implementation of the paper "Action-Conditioned 3D Human Motion Synthesis with Transformer VAE", ICCV 2021