Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Last update: May 11, 2022

Overview

Keyword Spotting Transformer

This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train on the 35 words speech command dataset

Paper : Keyword Transformer: A Self-Attention Model for Keyword Spotting

Model architecture

Download the dataset

To download the dataset use the following command

wget https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz
mkdir data
mv ./speech_commands_v0.02.tar.gz ./data
cd ./data
tar -xf ./speech_commands_v0.02.tar.gz
cd ../

Setup virtual environment

virtualenv -p python3 venv
source ./venv/bin/activate

Install dependencies

pip install -r requirements.txt

Training the model

To train the model run this command

python3 train.py --data_dir ${Path to data directory} \
                 --logdir ${Path to log directory} \
                 --num_layers ${Number of sequential encoder layers} \
                 --d_model ${Dimension of the encoder layers} \
                 --num_heads ${Number of heads in multi head attention layer} \
                 --mlp_dim ${Dimension of mlp layers} \
                 --lr ${Learning rate} \
                 --weight_decay ${Weight decay} \
                 --batch_size ${Batch size} \
                 --epochs ${Number of epochs} \
                 --save_dir ${Directory to save the model weights}

To track your training metrics

tensorboard --logdir  ${Path to log directory}

Predicting keyword of audio file

To predict the keyword of the audio file

python3 test.py --model_dir ${Saved model directory} \
                --file_path ${Audio file}

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Related tags

Overview

Keyword Spotting Transformer

Model architecture

Download the dataset

Setup virtual environment

Install dependencies

Training the model

Predicting keyword of audio file

Owner

Intelligent Machines Limited

Read and write layered TIFF ImageSourceData and ImageResources tags

[제 13회 투빅스 컨퍼런스] OK Mugle! - 장르부터 멜로디까지, Content-based Music Recommendation

Аналитика доходности инвестиционного портфеля в Тинькофф брокере

Official implementation of Protected Attribute Suppression System, ICCV 2021

Winners of DrivenData's Overhead Geopose Challenge

This repository contains the code needed to train Mega-NeRF models and generate the sparse voxel octrees

Code repo for "Towards Interpretable Deep Networks for Monocular Depth Estimation" paper.

Using fully convolutional networks for semantic segmentation with caffe for the cityscapes dataset

Meta-learning for NLP

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.

[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

AquaTimer - Programmable Timer for Aquariums based on ATtiny414/814/1614

The CLRS Algorithmic Reasoning Benchmark

Person Re-identification

This a classic fintech problem that introduces real life difficulties such as data imbalance. Check out the notebook to find out more!

My personal code and solution to the Synacor Challenge from 2012 OSCON.

Implementation of the Swin Transformer in PyTorch.

Yolov5 deepsort inference，使用YOLOv5+Deepsort实现车辆行人追踪和计数，代码封装成一个Detector类，更容易嵌入到自己的项目中

Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction

GitHub repository for "Improving Video Generation for Multi-functional Applications"