Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Last update: Jan 03, 2023

Related tags

Overview

Introduction

This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ([email protected]), Tiezheng Wang ([email protected]) and thanks for advice from TongFeng.

SpeakerGAN paper

SpeakerGAN: Speaker identification with conditional generative adversarial network， by Liyang Chen , Yifeng Liu , Wendong Xiao , Yingxue Wang ,Haiyong Xie.

Usage

For train / test / generate:

python speakergan.py

You may need to change the path of wav vad preprocessed files.

Our results

acc: 94.27% with random sampled testset. 

acc: 93.21% with fixed start sampled testset.

using model file: model/49_D.pkl

acc: 98.44% on training classification accuracy with real samples.

There is about 4% gap on testset lower compared to paper result. We can't find out the reason. We want your help !

Details of paper

The following are details about this paper.

================ input ==================

feature: fbank, 8000hz, 25ms frame, 10ms overlap. shape:(160,64)
dataset: librispeech-100 train-clean-100 POI:251
data preprocess: vad、mean and variance normalization, shuffled.
60% train. 40% test.

================ model architecture ==================

dataflow: data -> feature extraction -> G & D
model architecture:

G: gated CNN, encoder-decoder, Huber loss + adversarial loss

D: ResnetBlocks, template average pooling, FC, softmax, crossentropy loss + adversarial loss
G: shuffler layer, GLU
D: ReLU

================ training ==================

lr: 0-9, 0.0005 | 9-49, 0.0002
L(d): λ1 λ2 = 1
batch_size: 64
D_train steps / G_train steps = 4
Ladv Loss: Label smoothing, 1 -> 0.7 ~ 1.0, 0 -> 0 ~ 0.3

======== not sure or differences with paper ========

weights,bias initialize function, use: xavier_uniform and zeros
pytorch huber_loss： + 0.5 to be same with paper. but no implement here.
for shorter wav, paper: padded with zero. we: padded with feature again.
gated cnn architecture.
we use webrtcvad mode(3) for vad preprocess.

Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Related tags

Overview

Introduction

SpeakerGAN paper

Usage

Our results

Details of paper

Owner

PyTorch code for JEREX: Joint Entity-Level Relation Extractor

Code for "My(o) Armband Leaks Passwords: An EMG and IMU Based Keylogging Side-Channel Attack" paper

Resilience from Diversity: Population-based approach to harden models against adversarial attacks

offical implement of our Lifelong Person Re-Identification via Adaptive Knowledge Accumulation in CVPR2021

Alfred-Restore-Iterm-Arrangement - An Alfred workflow to restore iTerm2 window Arrangements

Reinforcement learning for self-driving in a 3D simulation

source code for 'Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge' by A. Shah, K. Shanmugam, K. Ahuja

CT-Net: Channel Tensorization Network for Video Classification

This is an open source python repository for various python tests

PyGCL: Graph Contrastive Learning Library for PyTorch

Numenta published papers code and data

MTCNN face detection implementation for TensorFlow, as a PIP package.

Python SDK for building, training, and deploying ML models

Official implementation of the article "Unsupervised JPEG Domain Adaptation For Practical Digital Forensics"

Multi-label classification of retinal disorders

A simple implementation of Kalman filter in Multi Object Tracking

A modification of Daniel Russell's notebook merged with Katherine Crowson's hq-skip-net changes

Minimal But Practical Image Classifier Pipline Using Pytorch, Finetune on ResNet18, Got 99% Accuracy on Own Small Datasets.

A benchmark dataset for mesh multi-label-classification based on cube engravings introduced in MeshCNN

An Unpaired Sketch-to-Photo Translation Model