LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

Last update: Dec 29, 2022

Overview

LightHuBERT

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

| Github | Huggingface | SUPERB Leaderboard |

The authors' PyTorch implementation and pretrained models of LightHuBERT.

March 2022: release preprint in arXiv and checkpoints in huggingface.

Pre-Trained Models

Model	Pre-Training Dataset	Download Link
LightHuBERT Base	960 hrs LibriSpeech	huggingface: lighthubert/lighthubert_base.pt
LightHuBERT Small	960 hrs LibriSpeech	huggingface: lighthubert/lighthubert_small.pt
LightHuBERT Stage 1	960 hrs LibriSpeech	huggingface: lighthubert/lighthubert_stage1.pt

Actually, the pre-trained is trained in common.fp16: true so that we can perform model inference with fp16 weights.

Requirements and Installation

PyTorch version >= 1.8.1
Python version >= 3.6
numpy version >= 1.19.3
To install lighthubert:

git clone [email protected]:mechanicalsea/lighthubert.git
cd lighthubert
pip install --editable .

Load Pre-Trained Models for Inference

import torch
from lighthubert import LightHuBERT, LightHuBERTConfig

wav_input_16khz = torch.randn(1,10000).cuda()

# load the pre-trained checkpoints
checkpoint = torch.load('/path/to/lighthubert.pt')
cfg = LightHuBERTConfig(checkpoint['cfg']['model'])
cfg.supernet_type = 'base'
model = LightHuBERT(cfg)
model = model.cuda()
model = model.eval()
print(model.load_state_dict(checkpoint['model'], strict=False))

# (optional) set a subnet
subnet = model.supernet.sample_subnet()
model.set_sample_config(subnet)
params = model.calc_sampled_param_num()
print(f"subnet (Params {params / 1e6:.0f}M) | {subnet}")

# extract the the representation of last layer
rep = model.extract_features(wav_input_16khz)[0]

# extract the the representation of each layer
hs = model.extract_features(wav_input_16khz, ret_hs=True)[0]

print(f"Representation at bottom hidden states: {torch.allclose(rep, hs[-1])}")

More examples can be found in our tutorials.

Universal Representation Evaluation on SUPERB

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ project.

Reference

If you find our work is useful in your research, please cite the following paper:

@article{wang2022lighthubert,
  title={{LightHuBERT}: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit {BERT}},
  author={Rui Wang and Qibing Bai and Junyi Ao and Long Zhou and Zhixiang Xiong and Zhihua Wei and Yu Zhang and Tom Ko and Haizhou Li},
  journal={arXiv preprint arXiv:2203.15610},
  year={2022}
}

Contact Information

For help or issues using LightHuBERT models, please submit a GitHub issue.

For other communications related to LightHuBERT, please contact Rui Wang ([email protected]).

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

Related tags

Overview

LightHuBERT

Pre-Trained Models

Requirements and Installation

Load Pre-Trained Models for Inference

Universal Representation Evaluation on SUPERB

License

Reference

Contact Information

Owner

WangRui

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

[CVPR 2021] MiVOS - Scribble to Mask module

Official Pytorch implementation for "End2End Occluded Face Recognition by Masking Corrupted Features, TPAMI 2021"

Open CV - Convert a picture to look like a cartoon sketch in python

Investigating automatic navigation towards standard US views integrating MARL with the virtual US environment developed in CT2US simulation

SOLOv2 on onnx & tensorRT

Optimizing synthesizer parameters using gradient approximation

PyTorch implementation of probabilistic deep forecast applied to air quality.

adversarial_multi_armed_bandit_variable_plays

Simply enable or disable your Nvidia dGPU

This is the official Pytorch implementation of the paper "Diverse Motion Stylization for Multiple Style Domains via Spatial-Temporal Graph-Based Generative Model"

buildseg is a building extraction plugin of QGIS based on PaddlePaddle.

PyTorch implementation of ENet

Object Detection and Multi-Object Tracking

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

A curated list of neural network pruning resources.

Some experiments with tennis player aging curves using Hilbert space GPs in PyMC. Only experimental for now.

Information Gain Filtration (IGF) is a method for filtering domain-specific data during language model finetuning. IGF shows significant improvements over baseline fine-tuning without data filtration.

SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

Erpnext app for make employee salary on payroll entry based on one or more project with percentage for all project equal 100 %