Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Last update: Apr 30, 2022

Overview

Speaker-Embeddings-Correlation-Pooling

This is the original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations" by T. Stafylakis, J. Rohdin, and L. Burget (Interspeech 2021), a result of the collaboration between Omilia - Conversational Intelligence and Brno University of Technology (BUT), which you may find here.

The code is in TensorFlow1 (TF1) but it should work with TF2 too. I only provide the code for creating the network and the required hyperparameters. The training hyperparameters we used can be found in the paper.

The code is well-commented, at least the part and (hyper-)parameters required for the correlation pooling.

Apart from the experiments provided in the paper, the code allows the user to: (a) Combine standard statistics pooling with correlation pooling, by concatenating the two pooling layers into a single one, and (b) Extract correlation pooling from outputs of all 4 internal ResNet blocks (aka stages) and concatenate them in the pooling layer.

The code can be more efficiently written using tensor-only operators. However, to facilitate research we have implemented it using lists of tensors, e.g. after merging frequency bins to frequency ranges. Despite this inefficiency, we observe no differences between correlation pooling and standard stats pooling in training speed.

Start with the file train_resnet.py, which creates the ResNet (with the pooling mechanism) and sets its parameters. All parameters are set so that you reproduce our best performing experiment (P7 in the paper).

So, try it and let us know what you'll get! Themos

Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Related tags

Overview

Speaker-Embeddings-Correlation-Pooling

Owner

Themos Stafylakis

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

Python generation script for BitBirds

Sapiens is a human antibody language model based on BERT.

Train and use generative text models in a few lines of code.

A repository to run gpt-j-6b on low vram machines (4.2 gb minimum vram for 2000 token context, 3.5 gb for 1000 token context). Model loading takes 12gb free ram.

Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

The entmax mapping and its loss, a family of sparse softmax alternatives.

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

This repo is to provide a list of literature regarding Deep Learning on Graphs for NLP

Autoregressive Entity Retrieval

Journalism AI – Quotes extraction for modular journalism

API for the GPT-J language model 🦜. Including a FastAPI backend and a streamlit frontend

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

Let Xiao Ai speakers control third-party devices

A programming language with logic of Python, and syntax of all languages.

Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention