SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Last update: May 20, 2022

Related tags

Deep Learning speechnas

Overview

speechnas

SpeechNAS-Better-Trade-off-between-Latency-and-Accuracy-for-Large-Scale-Speaker-Verification

ASRU 2021 IEEE Automatic Speech Recognition and Understanding

If this repository is useful to you, please cite our work properly. Thank you!

SpeechNAS-Better-Trade-off-between-Latency-and-Accuracy-for-Large-Scale-Speaker-Verification, ASRU 2021.

Environment

Set up the environment for the reposity by

PyTorch 1.7+

Check configuration

Check configuration in ./config/

inference

bash metric/metric_eer/auto_run.sh

Recently, x-vector has been a successful and popular approach for speaker verification, which employs a time delay neural network (TDNN) and statistics pooling to extract speaker characterizing embedding from variable-length utterances. Improvement upon the x-vector has been an active research area, and enormous neural networks have been elaborately designed based on the x-vector, eg, extended TDNN (E-TDNN), factorized TDNN (F-TDNN), and densely connected TDNN (D-TDNN). In this work, we try to identify the optimal architectures from a TDNN based search space employing neural architecture search (NAS), named SpeechNAS. Leveraging the recent advances in the speaker recognition, such as high-order statistics pooling, multi-branch mechanism, D-TDNN and angular additive margin softmax (AAM) loss with a minimum hyper-spherical energy (MHE), SpeechNAS automatically discovers five network architectures, from SpeechNAS-1 to SpeechNAS-5, of various numbers of parameters and GFLOPs on the large-scale text-independent speaker recognition dataset VoxCeleb1. Our derived best neural network achieves an equal error rate (EER) of 1.02% on the standard test set of VoxCeleb1, which surpasses previous TDNN based state-of-the-art approaches by a large margin.

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Related tags

Overview

speechnas

Environment

Check configuration

inference

Owner

Wentao Zhu

U-Net Brain Tumor Segmentation

Official code for "On the Frequency Bias of Generative Models", NeurIPS 2021

Pretrained models for Jax/Haiku; MobileNet, ResNet, VGG, Xception.

DeepVoxels is an object-specific, persistent 3D feature embedding.

Read and write layered TIFF ImageSourceData and ImageResources tags

Code for CPM-2 Pre-Train

Official implementation of the paper Label-Efficient Semantic Segmentation with Diffusion Models

CAST: Character labeling in Animation using Self-supervision by Tracking

Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Hyperbolic Image Segmentation, CVPR 2022

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

MPLP: Metapath-Based Label Propagation for Heterogenous Graphs

PointPillars inference with TensorRT

Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

【Arxiv】Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution

PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"

A library for finding knowledge neurons in pretrained transformer models.

Testing the Facial Emotion Recognition (FER) algorithm on animations

Robotics with GPU computing

🗣️ Microsoft Edge TTS for Home Assistant, no need for app_key