Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Last update: May 11, 2022

Overview

Keyword Spotting Transformer

This is the unofficial TensorFlow implementation of the Keyword Spotting Transformer model. This model is used to train on the 35 words speech command dataset

Paper : Keyword Transformer: A Self-Attention Model for Keyword Spotting

Model architecture

Download the dataset

To download the dataset use the following command

wget https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz
mkdir data
mv ./speech_commands_v0.02.tar.gz ./data
cd ./data
tar -xf ./speech_commands_v0.02.tar.gz
cd ../

Setup virtual environment

virtualenv -p python3 venv
source ./venv/bin/activate

Install dependencies

pip install -r requirements.txt

Training the model

To train the model run this command

python3 train.py --data_dir ${Path to data directory} \
                 --logdir ${Path to log directory} \
                 --num_layers ${Number of sequential encoder layers} \
                 --d_model ${Dimension of the encoder layers} \
                 --num_heads ${Number of heads in multi head attention layer} \
                 --mlp_dim ${Dimension of mlp layers} \
                 --lr ${Learning rate} \
                 --weight_decay ${Weight decay} \
                 --batch_size ${Batch size} \
                 --epochs ${Number of epochs} \
                 --save_dir ${Directory to save the model weights}

To track your training metrics

tensorboard --logdir  ${Path to log directory}

Predicting keyword of audio file

To predict the keyword of the audio file

python3 test.py --model_dir ${Saved model directory} \
                --file_path ${Audio file}

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Related tags

Overview

Keyword Spotting Transformer

Model architecture

Download the dataset

Setup virtual environment

Install dependencies

Training the model

Predicting keyword of audio file

Owner

Intelligent Machines Limited

Location-Sensitive Visual Recognition with Cross-IOU Loss

Computer Vision application in the web

Arxiv harvester - Poor man's simple harvester for arXiv resources

First-Order Probabilistic Programming Language

On Out-of-distribution Detection with Energy-based Models

Patch SVDD for Image anomaly detection

Flybirds - BDD-driven natural language automated testing framework, present by Trip Flight

RLMeta is a light-weight flexible framework for Distributed Reinforcement Learning Research.

A Pytorch Implementation for Compact Bilinear Pooling.

An implementation of chunked, compressed, N-dimensional arrays for Python.

Implementation of paper: "Image Super-Resolution Using Dense Skip Connections" in PyTorch

[PNAS2021] The neural architecture of language: Integrative modeling converges on predictive processing

SphereFace: Deep Hypersphere Embedding for Face Recognition

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

Repository for XLM-T, a framework for evaluating multilingual language models on Twitter data

[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

Official Chainer implementation of GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral)

A python library for implementing a recommender system

DiAne is a smart fuzzer for IoT devices

A 1.3B text-to-image generation model trained on 14 million image-text pairs