Autoregressive Predictive Coding: An unsupervised autoregressive model for speech representation learning

Overview

Autoregressive Predictive Coding

This repository contains the official implementation (in PyTorch) of Autoregressive Predictive Coding (APC) proposed in An Unsupervised Autoregressive Model for Speech Representation Learning.

APC is a speech feature extractor trained on a large amount of unlabeled data. With an unsupervised, autoregressive training objective, representations learned by APC not only capture general acoustic characteristics such as speaker and phone information from the speech signals, but are also highly accessible to downstream models--our experimental results on phone classification show that a linear classifier taking the APC representations as the input features significantly outperforms a multi-layer percepron using the surface features.

Dependencies

  • Python 3.5
  • PyTorch 1.0

Dataset

In the paper, we used the train-clean-360 split from the LibriSpeech corpus for training the APC models, and the dev-clean split for keeping track of the training loss. We used the log Mel spectrograms, which were generated by running the Kaldi scripts, as the input acoustic features to the APC models. Of course you can generate the log Mel spectrograms yourself, but to help you better reproduce our results, here we provide the links to the data proprocessed by us that can be directly fed to the APC models. We also include other data splits that we did not use in the paper for you to explore, e.g., you can try training an APC model on a larger and nosier set (e.g., train-other-500) and see if it learns more robust speech representations.

Training APC

Below we will follow the paper and use train-clean-360 and dev-clean as demonstration. Once you have downloaded the data, unzip them by running:

xz -d train-clean-360.xz
xz -d dev-clean.xz

Then, create a directory librispeech_data/kaldi and move the data into it:

mkdir -p librispeech_data/kaldi
mv train-clean-360-hires-norm.blogmel librispeech_data/kaldi
mv dev-clean-hires-norm.blogmel librispeech_data/kaldi

Now we will have to transform the data into the format loadable by the PyTorch DataLoader. To do so, simply run:

# Prepare the training set
python prepare_data.py --librispeech_from_kaldi librispeech_data/kaldi/train-clean-360-hires-norm.blogmel --save_dir librispeech_data/preprocessed/train-clean-360-hires-norm.blogmel
# Prepare the valication set
python prepare_data.py --librispeech_from_kaldi librispeech_data/kaldi/dev-clean-hires-norm.blogmel --save_dir librispeech_data/preprocessed/dev-clean-hires-norm-blogmel

Once the program is done, you will see a directory preprocessed/ inside librispeech_data/ that contains all the preprocessed PyTorch tensors.

To train an APC model, simply run:

python train_apc.py

By default, the trained models will be put in logs/. You can also use Tensorboard to trace the training progress. There are many other configurations you can try, check train_apc.py for more details--it is highly documented and should be self-explanatory.

Feature extraction

Once you have trained your APC model, you can use it to extract speech features from your target dataset. To do so, feed-forward the trained model on the target dataset and retrieve the extracted features by running:

_, feats = model.forward(inputs, lengths)

feats is a PyTorch tensor of shape (num_layers, batch_size, seq_len, rnn_hidden_size) where:

  • num_layers is the RNN depth of your APC model
  • batch_size is your inference batch size
  • seq_len is the maximum sequence length and is determined when you run prepare_data.py. By default this value is 1600.
  • rnn_hidden_size is the dimensionality of the RNN hidden unit.

As you can see, feats is essentially the RNN hidden states in an APC model. You can think of APC as a speech version of ELMo if you are familiar with it.

There are many ways to incorporate feats into your downstream task. One of the easiest way is to take only the outputs of the last RNN layer (i.e., feats[-1, :, :, :]) as the input features to your downstream model, which is what we did in our paper. Feel free to explore other mechanisms.

Pre-trained models

We release the pre-trained models that were used to produce the numbers reported in the paper. load_pretrained_model.py provides a simple example of loading a pre-trained model.

Reference

Please cite our paper(s) if you find this repository useful. This first paper proposes the APC objective, while the second paper applies it to speech recognition, speech translation, and speaker identification, and provides more systematic analysis on the learned representations. Cite both if you are kind enough!

@inproceedings{chung2019unsupervised,
  title = {An unsupervised autoregressive model for speech representation learning},
  author = {Chung, Yu-An and Hsu, Wei-Ning and Tang, Hao and Glass, James},
  booktitle = {Interspeech},
  year = {2019}
}
@inproceedings{chung2020generative,
  title = {Generative pre-training for speech with autoregressive predictive coding},
  author = {Chung, Yu-An and Glass, James},
  booktitle = {ICASSP},
  year = {2020}
}

Contact

Feel free to shoot me an email for any inquiries about the paper and this repository.

Owner
iamyuanchung
Natural language & speech processing researcher
iamyuanchung
Algorithmic trading using machine learning.

Algorithmic Trading This machine learning algorithm was built using Python 3 and scikit-learn with a Decision Tree Classifier. The program gathers sto

Sourav Biswas 101 Nov 10, 2022
A Python library for working with arbitrary-dimension hypercomplex numbers following the Cayley-Dickson construction of algebras.

Hypercomplex A Python library for working with quaternions, octonions, sedenions, and beyond following the Cayley-Dickson construction of hypercomplex

7 Nov 04, 2022
Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Trevor Ablett*, Bryan Chan*,

STARS Laboratory 8 Sep 14, 2022
exponential adaptive pooling for PyTorch

AdaPool: Exponential Adaptive Pooling for Information-Retaining Downsampling Abstract Pooling layers are essential building blocks of Convolutional Ne

Alexandros Stergiou 55 Jan 04, 2023
Self-supervised learning optimally robust representations for domain generalization.

OptDom: Learning Optimal Representations for Domain Generalization This repository contains the official implementation for Optimal Representations fo

Yangjun Ruan 18 Aug 25, 2022
A modular active learning framework for Python

Modular Active Learning framework for Python3 Page contents Introduction Active learning from bird's-eye view modAL in action From zero to one in a fe

modAL 1.9k Dec 31, 2022
The implementation of ICASSP 2020 paper "Pixel-level self-paced learning for super-resolution"

Pixel-level Self-Paced Learning for Super-Resolution This is an official implementaion of the paper Pixel-level Self-Paced Learning for Super-Resoluti

Elon Lin 41 Dec 15, 2022
Image Completion with Deep Learning in TensorFlow

Image Completion with Deep Learning in TensorFlow See my blog post for more details and usage instructions. This repository implements Raymond Yeh and

Brandon Amos 1.3k Dec 23, 2022
Leaderboard and Visualization for RLCard

RLCard Showdown This is the GUI support for the RLCard project and DouZero project. RLCard-Showdown provides evaluation and visualization tools to hel

Data Analytics Lab at Texas A&M University 246 Dec 26, 2022
Official Implementation of PCT

Official Implementation of PCT Prerequisites python == 3.8.5 Please make sure you have the following libraries installed: numpy torch=1.4.0 torchvisi

32 Nov 21, 2022
“英特尔创新大师杯”深度学习挑战赛 赛道3:CCKS2021中文NLP地址相关性任务

ccks2021-track3 CCKS2021中文NLP地址相关性任务-赛道三-冠军方案 团队:我的加菲鱼- wodejiafeiyu 初赛第二/复赛第一/决赛第一 前言 19年开始,陆陆续续参加了一些比赛,拿到过一些top,比较懒一直都没分享过,这次比较幸运又拿了top1,打算分享下 分类的任务

shaochenjie 131 Dec 31, 2022
Tree-based Search Graph for Approximate Nearest Neighbor Search

TBSG: Tree-based Search Graph for Approximate Nearest Neighbor Search. TBSG is a graph-based algorithm for ANNS based on Cover Tree, which is also an

Fanxbin 2 Dec 27, 2022
A Comparative Framework for Multimodal Recommender Systems

Cornac Cornac is a comparative framework for multimodal recommender systems. It focuses on making it convenient to work with models leveraging auxilia

Preferred.AI 671 Jan 03, 2023
Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

61 Jan 01, 2023
Predict multi paths to a moving person depending on his trajectory history.

Multi-future Trajectory Prediction The project is about using the Multiverse model to make possible multible-future trajectory prediction for a seen p

Said Gamal 1 Jan 18, 2022
Generating Anime Images by Implementing Deep Convolutional Generative Adversarial Networks paper

AnimeGAN - Deep Convolutional Generative Adverserial Network PyTorch implementation of DCGAN introduced in the paper: Unsupervised Representation Lear

Rohit Kukreja 23 Jul 21, 2022
Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus

free-duolingo-plus duolingo account creator that uses your invite code to get yo

1 Jan 06, 2022
Code implementing "Improving Deep Learning Interpretability by Saliency Guided Training"

Saliency Guided Training Code implementing "Improving Deep Learning Interpretability by Saliency Guided Training" by Aya Abdelsalam Ismail, Hector Cor

8 Sep 22, 2022
PyTorch implementation of DCT fast weight RNNs

DCT based fast weights This repository contains the official code for the paper: Training and Generating Neural Networks in Compressed Weight Space. T

Kazuki Irie 4 Dec 24, 2022
RTSeg: Real-time Semantic Segmentation Comparative Study

Real-time Semantic Segmentation Comparative Study The repository contains the official TensorFlow code used in our papers: RTSEG: REAL-TIME SEMANTIC S

Mennatullah Siam 592 Nov 18, 2022