This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

Last update: Sep 15, 2022

Related tags

Deep Learning CPC_DeepCluster

Overview

CPC_DeepCluster

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

setup instructions

Clone the repo: https://github.com/iiscleap/CPC_DeepCluster.git
Install libraries which would be required for torch-audio https://github.com/pytorch/audio :

Linux: sudo apt-get install sox libsox-dev libsox-fmt-all

conda env create -f environment.yml && conda activate cpc37
Run setup.py python setup.py develop

Using the Repository

To start the training :

python cpc/train_mod.py --pathDB $PATH_AUDIO_FILES --pathCheckpoint $PATH_CHECKPOINT_DIR --LabelsPath $Path_Pseudo_Labels --file_extension $EXTENSION --normMode batchNormn--rnnMode linear --nLevelsGRU 2 --max_size_loaded 1000000000 --save_step 1 --alpha_val $Cluster_Loss_Weighting

Where:

$PATH_AUDIO_FILES is the directory containing the audio files. The files should be arranged as below:

PATH_AUDIO_FILES
│
└───speaker1
│   └───...
│         │   seq_11.{$EXTENSION}
│         │   seq_12.{$EXTENSION}
│         │   ...
│
└───speaker2
    └───...
          │   seq_21.{$EXTENSION}
          │   seq_22.{$EXTENSION}

$PATH_CHECKPOINT_DIR in the directory where the checkpoints will be saved
$EXTENSION is the extension of each audio file
$Path_Pseudo_Labels is the directory that contains the psuedo labels of all the audio files in $PATH_AUDIO_FILES
$Cluster_Loss_Weighting provides the weighting factor for the cluster loss.

Restarting the session

To restart a session from the last save checkpoint run

python cpc/train_mod.py --pathCheckpoint $PATH_CHECKPOINT_DIR

Generating the pseudo labels for training

Create quantized.txt using the repository here

python create_pseudolabels.py --input_file $Path_Containing_quantized.txt --out_path $Output_Dir

$Output_Dir is the directory where .pt files containing pseudo labels

Extracting features, training K Means and Language Models

Extract the features for K means clustering and train K Means clustering, Language models using the repository here

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

Related tags

Overview

CPC_DeepCluster

setup instructions

Using the Repository

Restarting the session

Generating the pseudo labels for training

Extracting features, training K Means and Language Models

Owner

LEAP Lab

Code for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in Deep Latent Space"

A system for quickly generating training data with weak supervision

Simple ONNX operation generator. Simple Operation Generator for ONNX.

Program your own vulkan.gpuinfo.org query in Python. Used to determine baseline hardware for WebGPU.

Certis - Certis, A High-Quality Backtesting Engine

PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Offical code for the paper: "Growing 3D Artefacts and Functional Machines with Neural Cellular Automata" https://arxiv.org/abs/2103.08737

Unofficial Implementation of MLP-Mixer in TensorFlow

[NeurIPS 2021] Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data

AFL binary instrumentation

Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

Elegy is a framework-agnostic Trainer interface for the Jax ecosystem.

Python codes for Lite Audio-Visual Speech Enhancement.

Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion

Twins: Revisiting the Design of Spatial Attention in Vision Transformers

Double pendulum simulator using a symplectic Euler's method and Hamiltonian mechanics

Controlling a game using mediapipe hand tracking

Extreme Lightwegith Portrait Segmentation

An imperfect information game is a type of game with asymmetric information