OneShot Learning-based hotword detection.

Last update: Dec 25, 2022

Overview

EfficientWord-Net

Hotword detection based on one-shot learning

Home assistants require special phrases called hotwords to get activated (eg:"ok google")

EfficientWord-Net is an hotword detection engine based on one-shot learning inspired from FaceNet's Siamese Network Architecture. Works very similar to face recognition , just requires a few samples of your own custom hotword to get going. No extra training or huge datasets required!! This will allow developers to add custom hotwords to their programs without a sweat or any extra charges. Just like google assistant's hotword detector, the engine performs the best when 3-4 hotword samples are collected directly from the user This repository is an official implemenation of EfficientWord-Net as a python library from the authors.

The library is purely written with python and uses Google's Tflite implemenation for faster realtime inference.

Demo of EfficientWord-Net in Pi

EfficientWord-Net.mp4

Access preprint

The research paper is currently under review in IEEE, click here to access the preprint and the training code will be available for public access once the paper is published.

Python Version Requirements

This Library works between python versions: 3.6 to 3.9

Dependencies Installation

Before running the pip installation command for the library, few dependencies need to be installed manually.

PyAudio (depends on PortAudio)
Tflite (tensorflow lightweight binaries)
Librosa (Binaries might not be available for certain systems) Mac OS M* and Raspberry Pi users might have to compile these dependecies.

tflite package cannot be listed in requirements.txt hence will be automatically installed when the package is initialized in the system.

librosa package is not required for inference only cases , however when generate_reference is called , will be automatically installed.

Package Installation

Run the following pip command

pip install EfficientWord-Net

and to import running

import eff_word_net

Demo

After installing the packages, you can run the Demo script inbuilt with library (ensure you have a working mic).

Accesss Documentation from : https://ant-brain.github.io/EfficientWord-Net/

Command to run demo

python -m eff_word_net.engine

Generating Custom Wakewords

For any new hotword, the library needs information about the hotword, this information is obtained from a file called {wakeword}_ref.json. Eg: For the wakeword 'alexa', the library would need the file called alexa_ref.json

These files can be generated with the following procedure:

One needs to collect few 4 to 10 uniquely sounding pronunciations of a given wakeword. Then put them into a seperate folder, which doesnt contain anything else.

Finally run this command, it will ask for the input folder's location (containing the audio files) and the output folder (where _ref.json file will be stored).

python -m eff_word_net.generate_reference

The pathname of the generated wakeword needs to passed to the HotwordDetector detector instance.

HotwordDetector(
        hotword="hello",
        reference_file = "/full/path/name/of/hello_ref.json"),
        activation_count = 3 #2 by default
)

Few wakewords such as Mycroft, Google, Firefox, Alexa, Mobile, Siri the library has predefined embeddings readily available in the library installation directory, its path is readily available in the following variable

from eff_word_net import samples_loc

Try your first single hotword detection script

import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net.engine import HotwordDetector
from eff_word_net import samples_loc

mycroft_hw = HotwordDetector(
        hotword="Mycroft",
        reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
        activation_count=3
    )

mic_stream = SimpleMicStream()
mic_stream.start_stream()

print("Say Mycroft ")
while True :
    frame = mic_stream.getFrame()
    result = mycroft_hw.checkFrame(frame)
    if(result):
        print("Wakeword uttered")

Detecting Mulitple Hotwords from audio streams

The library provides a computation friendly way to detect multiple hotwords from a given stream, installed of running checkFrame() of each wakeword individually

import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net import samples_loc
print(samples_loc)

alexa_hw = HotwordDetector(
        hotword="Alexa",
        reference_file = os.path.join(samples_loc,"alexa_ref.json"),
    )

siri_hw = HotwordDetector(
        hotword="Siri",
        reference_file = os.path.join(samples_loc,"siri_ref.json"),
    )

mycroft_hw = HotwordDetector(
        hotword="mycroft",
        reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
        activation_count=3
    )

multi_hw_engine = MultiHotwordDetector(
        detector_collection = [
            alexa_hw,
            siri_hw,
            mycroft_hw,
        ],
    )

mic_stream = SimpleMicStream()
mic_stream.start_stream()

print("Say Mycroft / Alexa / Siri")

while True :
    frame = mic_stream.getFrame()
    result = multi_hw_engine.findBestMatch(frame)
    if(None not in result):
        print(result[0],f",Confidence {result[1]:0.4f}")

Access documentation of the library from here : https://ant-brain.github.io/EfficientWord-Net/

About `activation_count` in `HotwordDetector`

Documenatation with detailed explanation on the usage of activation_count parameter in HotwordDetector is in the making , For now understand that for long hotwords 3 is advisable and 2 for smaller hotwords. If the detector gives out multiple triggers for a single utterance, try increasing activation_count. To experiment begin with smaller values. Default value for the same is 2

FAQ :

Hotword Perfomance is bad : if you are having some issue like this , feel to ask the same in discussions

CONTRIBUTION:

If you have an ideas to make the project better, feel free to ping us in discussions
The current logmelcalc.tflite graph can convert only 1 audio frame to Log Mel Spectrogram at a time. It will be of a great help if tensorflow guru's outthere help us out with this.

TODO :

Add audio file handler in streams. PR's are welcome.
Remove librosa requirement to encourage generating reference files directly in edge devices
Add more detailed documentation explaining slider window concept

SUPPORT US:

Our hotword detector's performance is notably low when compared to Porcupine. We have thought about better NN architectures for the engine and hope to outperform Porcupine. This has been our undergrad project. Hence your support and encouragement will motivate us to develop the engine. If you loved this project recommend this to your peers, give us a 🌟 in Github and a clap 👏 in medium.

LICENCSE : Apache License 2.0

Comments

Threshold value in engine.py not working?

hello,

first of all, thank you for this great library!

I managed to make it work on my M1 MacBook Air, and trying out my personal hotword detection, but the threshold value does not seem to be working on my environment.

In engine.py:

    def __init__(
            self,
            hotword:str,
            reference_file:str,
            threshold:float=0.995,
            activation_count=2,
            continuous=True,
            verbose = False):

And this is my script:

import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net.engine import HotwordDetector
from eff_word_net import samples_loc

hotword_hw = HotwordDetector(
        hotword="hotword",
        reference_file = "hotword_ref.json",
        activation_count=3
    )

mic_stream = SimpleMicStream()
mic_stream.start_stream()

print("Say hotword ")
while True :
    frame = mic_stream.getFrame()
    result = hotword_hw.checkFrame(frame)
        print("Wakeword uttered")
        print(hotword_hw.getMatchScoreFrame(frame))

and when I run this script, the checkFrame returns true even when the getMatchScoreFrame returns under the threshold, like:

Wakeword uttered
0.9371609374494279
Wakeword uttered
0.9164050520717595
Wakeword uttered
0.9082509350226378
...

Could you please take a look at this?

Thank you!

opened by dominickchen 10

Hotword detection triggers the moment any sound is being playd, even with the default models

So I've been trying to make a custom hotwork. But after seeing it trigger all the time, the moment any kind of sound is being recorded, I decided to use a default one, like "brightness", "mobile", "google", etc.

They all trigger immediatley. Using the default values for the HotWordDetector, by the way. Any clue why? It seemed to have worked great in your video presentation.

Not using a cheap ass microphone by the way.

opened by TrackLab 9
circuit diagram

Good evening！Can you send me the circuit diagram of raspberry pie connecting the bread board and lighting the LED light? Your experiment is so interesting that I want to repeat it.
documentation

opened by preachwebsite 5
Invalid sample rate

ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.front.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM front ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround21 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround21 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround40.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround40 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround41 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround50 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround51 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround71.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround71 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM iec958 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM spdif ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM spdif ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_lavrate.so (libasound_module_rate_lavrate.so: libasound_module_rate_lavrate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_samplerate.so (libasound_module_rate_samplerate.so: libasound_module_rate_samplerate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_speexrate.so (libasound_module_rate_speexrate.so: libasound_module_rate_speexrate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_jack.so (libasound_module_pcm_jack.so: libasound_module_pcm_jack.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_jack.so (libasound_module_pcm_jack.so: libasound_module_pcm_jack.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_oss.so (libasound_module_pcm_oss.so: libasound_module_pcm_oss.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_oss.so (libasound_module_pcm_oss.so: libasound_module_pcm_oss.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_pulse.so (libasound_module_pcm_pulse.so: libasound_module_pcm_pulse.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_pulse.so (libasound_module_pcm_pulse.so: libasound_module_pcm_pulse.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_a52.so (libasound_module_pcm_a52.so: libasound_module_pcm_a52.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_a52.so (libasound_module_pcm_a52.so: libasound_module_pcm_a52.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_upmix.so (libasound_module_pcm_upmix.so: libasound_module_pcm_upmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_upmix.so (libasound_module_pcm_upmix.so: libasound_module_pcm_upmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_vdownmix.so (libasound_module_pcm_vdownmix.so: libasound_module_pcm_vdownmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_vdownmix.so (libasound_module_pcm_vdownmix.so: libasound_module_pcm_vdownmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_usb_stream.so (libasound_module_pcm_usb_stream.so: libasound_module_pcm_usb_stream.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_usb_stream.so (libasound_module_pcm_usb_stream.so: libasound_module_pcm_usb_stream.so: cannot open shared object file: No such file or directory) Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2043 Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2713 Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2837 Traceback (most recent call last): File "/home/pi/Documents/test.py", line 11, in mic_stream = SimpleMicStream() File "/home/pi/Documents/eff_word_net/streams.py", line 71, in init mic_stream=p.open( File "/home/pi/Downloads/yes/envs/eff/lib/python3.8/site-packages/pyaudio.py", line 750, in open stream = Stream(self, *args, **kwargs) File "/home/pi/Downloads/yes/envs/eff/lib/python3.8/site-packages/pyaudio.py", line 441, in init self._stream = pa.open(**arguments) OSError: [Errno -9997] Invalid sample rate

Good evening！I have encountered this problem. How can I solve it?Looking forward to your reply.
raspberrypi

opened by preachwebsite 3
Could you help me?

I won't deploy its running environment. Can you control it remotely? I have TeamViewer, a remote control software. The ID is 621 081 831. Or use other remote control. We look forward to your help.

opened by preachwebsite 3
raising precision of custom wakeword

I'm curious whether the precision of custom wakeword improves if you provide more sound files, e.g. 50 files from different people? or is that meaningless?

We want to use a custom wakeword for a public interaction system, and want it to recognize voice input from a wide range of people (young&old, male&female, etc).

Thank you for letting me know.

opened by dominickchen 3
Invalid input device (no default output device)

ALSA lib conf.c:3723:(snd_config_hooks_call) Cannot open shared library libasound_module_conf_pulse.so (libasound_module_conf_pulse.so: libasound_module_conf_pulse.so: cannot open shared object file: No such file or directory) ALSA lib control.c:1379:(snd_ctl_open_noupdate) Invalid CTL hw:0 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.front.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM front ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround40 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround41 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround50 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround51 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround71 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM iec958 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM spdif ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM spdif ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_lavrate.so (libasound_module_rate_lavrate.so: libasound_module_rate_lavrate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_samplerate.so (libasound_module_rate_samplerate.so: libasound_module_rate_samplerate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_speexrate.so (libasound_module_rate_speexrate.so: libasound_module_rate_speexrate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_jack.so (libasound_module_pcm_jack.so: libasound_module_pcm_jack.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_jack.so (libasound_module_pcm_jack.so: libasound_module_pcm_jack.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_oss.so (libasound_module_pcm_oss.so: libasound_module_pcm_oss.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_oss.so (libasound_module_pcm_oss.so: libasound_module_pcm_oss.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_pulse.so (libasound_module_pcm_pulse.so: libasound_module_pcm_pulse.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_pulse.so (libasound_module_pcm_pulse.so: libasound_module_pcm_pulse.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_a52.so (libasound_module_pcm_a52.so: libasound_module_pcm_a52.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_a52.so (libasound_module_pcm_a52.so: libasound_module_pcm_a52.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_upmix.so (libasound_module_pcm_upmix.so: libasound_module_pcm_upmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_upmix.so (libasound_module_pcm_upmix.so: libasound_module_pcm_upmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_vdownmix.so (libasound_module_pcm_vdownmix.so: libasound_module_pcm_vdownmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_vdownmix.so (libasound_module_pcm_vdownmix.so: libasound_module_pcm_vdownmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_usb_stream.so (libasound_module_pcm_usb_stream.so: libasound_module_pcm_usb_stream.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_usb_stream.so (libasound_module_pcm_usb_stream.so: libasound_module_pcm_usb_stream.so: cannot open shared object file: No such file or directory) Traceback (most recent call last): File "/home/pi/Documents/test.py", line 11, in mic_stream = SimpleMicStream() File "/home/pi/Documents/eff_word_net/streams.py", line 71, in init mic_stream=p.open( File "/home/pi/Downloads/yes/envs/eff/lib/python3.8/site-packages/pyaudio.py", line 750, in open stream = Stream(self, *args, **kwargs) File "/home/pi/Downloads/yes/envs/eff/lib/python3.8/site-packages/pyaudio.py", line 441, in init self._stream = pa.open(**arguments) OSError: [Errno -9996] Invalid input device (no default output device)

I have encountered this problem. How can I solve it?

opened by preachwebsite 2

Here that working fine with ref file but not if a record custom file.

Hello, i working on google collab, so i don't have access to mic. The work around is to used mp3 or wav file. To do that i have add this class:

from streams import CustomAudioStream
from pydub import AudioSegment

import numpy as np
import wave

RATE = 16000
index = 0

class SimpleFileStream(CustomAudioStream) :

    def open_stream(self, src, mp3):
        if mp3:
          dst = "Data/sample.wav"
          # convert mp3 to wav              
          sound = AudioSegment.from_mp3(src).set_frame_rate(16000)
          sound.export(dst, format="wav")
          self.wf = wave.open(dst, 'rb')
        else:
          print("Not an mp3")
          self.wf = wave.open(src, 'rb')
          self.wf.rewind()
        print("Get params of wav file " + str(self.wf.getparams()))

    def close_stream(self):
        self.wf.close()

    def get_next_frame(self):
        global index
        print("Index ", index)
        index = index + self.CHUNK
        return np.frombuffer(self.wf.readframes(self.CHUNK),dtype=np.int16)

    """
    Implements stream with sliding window, 
    implemented by inheriting CustomAudioStream
    """
    def __init__(self,sliding_window_secs:float=1/8):
        self.CHUNK = int(sliding_window_secs*RATE)

        CustomAudioStream.__init__(
            self,
            open_stream = self.open_stream,
            close_stream = self.close_stream,
            get_next_frame = self.get_next_frame,
        )

It seems working if i used ref file of github. But if i record a custom file using audacity it is not detect the wakeword.

If i change the threshold to 0.7 and the activation count to 2 it is work better, but il will increase the chance of getting false positive.

Is it mandatory to have custom ref for each user ?

Best regards Sebastien

opened by warichet 2

Bump numpy from 1.20.0 to 1.22.0
Bumps numpy from 1.20.0 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Discussion : Hotword's accuracy too low

If you are playing around with using your own custom hotwords and some hotword happen to not work so good Feel free to use the thread in discussions https://github.com/Ant-Brain/EfficientWord-Net/discussions/4

opened by TheSeriousProgrammer 0
Hotword matches without any utterance

Hi, first of all thanks for making this library, Its fantastic!! I understand well since its in very early phase so it will have some issue and eventually it will better. So this time I was trying to go with the given example of hotword detector, I tried to attach a speech recognition after hotword triggers, but the performance is quite messy , to demonstrate this I am including this gif.

Problem1: Basically what happening is I am trying to call speech recognition right after there is match, as the speech recognition ends it again shows hotword uttered and re listen, even though there no hotword uttered and with confidence.

Problem2: Also in some situations it matches when there is little click or desk sound.

any fix for at least for Problem 1 I see problem 2 could be the reason of weak training as depending upon the hotword.

opened by OnlinePage 2

OSError: [Errno -9981] Input overflowed

I've installed the python library with

pip install EfficientWord-Net

onto a raspberry pi 2 with recent raspbian lite.

However if i run the demo with python -m eff_word_net.engine i'll get the following error and nothing works:

Say Mycroft / Alexa / Siri
Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/max/.local/lib/python3.9/site-packages/eff_word_net/engine.py", line 333, in <module>
    frame = mic_stream.getFrame()
  File "/home/max/.local/lib/python3.9/site-packages/eff_word_net/streams.py", line 49, in getFrame
    new_frame = self._get_next_frame()
  File "/home/max/.local/lib/python3.9/site-packages/eff_word_net/streams.py", line 85, in <lambda>
    np.frombuffer(mic_stream.read(CHUNK),dtype=np.int16)
  File "/usr/lib/python3/dist-packages/pyaudio.py", line 608, in read
    return pa.read_stream(self._stream, num_frames, exception_on_overflow)
OSError: [Errno -9981] Input overflowed

any idea?

opened by mKenfenheuer 1

complex hotwords support #Current Model Limitations Discussion

Hi, Thanks for your helpful research. I wonder if the current model can handle complex hot words like "Hey Siri" or just handle one word, like "Siri"?

My second question is about hot words that their pronunciation takes more than 1s, like"Hey XXXX." Does your model support changing the recording time?

Did you try to use cosine_similarity instead of Euclidian distance in inference time?

Thanks.
enhancement

opened by amoazeni75 7
Problem with Dependencies #Docker Support

Hello I left a comment on Reddit saying I would give it a go, and you said if I had a problem to log it here, so here I am, with a problem 😊

I seem to get stuck with pip3 install librosa I get this error Failed building wheel for llvmlite Running setup.py clean for llvmlite Failed to build llvmlite I can push on and get EfficientWord installed and working, if I say Alexa it says Yup I hear ya

The problem is then when I try to create my own wake word I run this command … python3 -m eff_word_net.generate_reference [email protected]:~ $ python3 -m eff_word_net.generate_reference Paste Path of folder Containing audio files:/home/pi/wakewords Paste Path of location to save *_ref.json :/home/pi/wakewords Enter Wakeword Name :bender Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in run_code exec(code, run_globals) File "/home/pi/.local/lib/python3.7/site-packages/eff_word_net/generate_reference.py", line 80, in input("Enter Wakeword Name :") File "/home/pi/.local/lib/python3.7/site-packages/eff_word_net/generate_reference.py", line 47, in generate_reference_file x, = librosa.load(audio_file,sr=16000) AttributeError: module 'librosa' has no attribute 'load'

My Problem is with librosa, I am not able to install it. I tried everything I could google but it will never install

How did you get around this problem ?
enhancement good first issue raspberrypi wake_word_generation

opened by Balro76 3

Releases(stable)

stable(Feb 19, 2022)

The engine used a sliding window approach to listen for hotwords, this results in one utterance having multiple triggers. To minimize the same, some complicated logic was used resulting in unnecessarily complex api. In this release we shift to more simpler approach i.e relaxation_time (min time required between any 2 triggers , earlier triggers will be dismissed) , resulting in more simpler programmer api

However these large upates are breaking changes : (
Source code(tar.gz)
Source code(zip)
EfficientWord_Net-0.2.2-py3-none-any.whl(1.65 MB)
v0.1.1-beta(Jan 6, 2022)

Improved false positive reduction and changes to reduce multiple trigger per utterance of the hotwiord to one

TODO: Need to update documentation accordingly
Source code(tar.gz)
Source code(zip)
EfficientWord_Net-0.1.1-py3-none-any.whl(1.65 MB)

Owner

ANT-BRaiN

Small is the new big.

GitHub Repository https://ant-brain.github.io/EfficientWord-Net/

Nested cross-validation is necessary to avoid biased model performance in embedded feature selection in high-dimensional data with tiny sample sizes

Pruner for nested cross-validation - Sphinx-Doc Nested cross-validation is necessary to avoid biased model performance in embedded feature selection i

1 Dec 15, 2021

Deep generative models of 3D grids for structure-based drug discovery

What is liGAN? liGAN is a research codebase for training and evaluating deep generative models for de novo drug design based on 3D atomic density grid

152 Jan 03, 2023

S-attack library. Official implementation of two papers "Are socially-aware trajectory prediction models really socially-aware?" and "Vehicle trajectory prediction works, but not everywhere".

S-attack library: A library for evaluating trajectory prediction models This library contains two research projects to assess the trajectory predictio

71 Jan 04, 2023

Multiple style transfer via variational autoencoder

ST-VAE Multiple style transfer via variational autoencoder By Zhi-Song Liu, Vicky Kalogeiton and Marie-Paule Cani This repo only provides simple testi

13 Oct 29, 2022

Simple embedding based text classifier inspired by fastText, implemented in tensorflow

FastText in Tensorflow This project is based on the ideas in Facebook's FastText but implemented in Tensorflow. However, it is not an exact replica of

306 Dec 02, 2022

Normalization Matters in Weakly Supervised Object Localization (ICCV 2021)

Normalization Matters in Weakly Supervised Object Localization (ICCV 2021) 99% of the code in this repository originates from this link. ICCV 2021 pap

10 Feb 01, 2022

Code of our paper "Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning"

CCOP Code of our paper Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning Requirement Install OpenSelfSup Install Detectron2

21 Dec 13, 2022

Voice Conversion Using Speech-to-Speech Neuro-Style Transfer

This repo contains the official implementation of the VAE-GAN from the INTERSPEECH 2020 paper Voice Conversion Using Speech-to-Speech Neuro-Style Transfer.

93 Jan 05, 2023

Bu repo SAHI uygulamasını mantığını öğreniyoruz.

SAHI-Learn: SAHI'den Beraber Kodlamak İster Misiniz Herkese merhabalar ben Kadir Nar. SAHI kütüphanesine gönüllü geliştiriciyim. Bu repo SAHI kütüphan

11 Aug 22, 2022

Official implementation of Deep Convolutional Dictionary Learning for Image Denoising.

DCDicL for Image Denoising Hongyi Zheng*, Hongwei Yong*, Lei Zhang, "Deep Convolutional Dictionary Learning for Image Denoising," in CVPR 2021. (* Equ

91 Dec 21, 2022

Generating Radiology Reports via Memory-driven Transformer

R2Gen This is the implementation of Generating Radiology Reports via Memory-driven Transformer at EMNLP-2020. Citations If you use or extend our work,

101 Dec 13, 2022

Mmdet benchmark with python

mmdet_benchmark 本项目是为了研究 mmdet 推断性能瓶颈，并且对其进行优化。配置与环境机器配置 CPU：Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz GPU：NVIDIA GeForce RTX 3080 10GB 内存：64G 硬盘：1T

24 May 21, 2022

IOT: Instance-wise Layer Reordering for Transformer Structures

Introduction This repository contains the code for Instance-wise Ordered Transformer (IOT), which is introduced in the ICLR2021 paper IOT: Instance-wi

19 Nov 15, 2022

JumpDiff: Non-parametric estimator for Jump-diffusion processes for Python

jumpdiff jumpdiff is a python library with non-parametric Nadaraya─Watson estimators to extract the parameters of jump-diffusion processes. With jumpd

28 Dec 10, 2022

[NeurIPS 2020] Official Implementation: "SMYRF: Efficient Attention using Asymmetric Clustering".

SMYRF: Efficient attention using asymmetric clustering Get started: Abstract We propose a novel type of balanced clustering algorithm to approximate a

46 Dec 22, 2022

This repository is the official implementation of Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models

Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models Link to paper Abstract We study prediction of future out

2 Aug 19, 2022