OneShot Learning-based hotword detection.

Overview

EfficientWord-Net

Versions : 3.6 ,3.7,3.8,3.9

Hotword detection based on one-shot learning

Home assistants require special phrases called hotwords to get activated (eg:"ok google")

EfficientWord-Net is an hotword detection engine based on one-shot learning inspired from FaceNet's Siamese Network Architecture. Works very similar to face recognition , just requires a few samples of your own custom hotword to get going. No extra training or huge datasets required!! This will allow developers to add custom hotwords to their programs without a sweat or any extra charges. Just like google assistant's hotword detector, the engine performs the best when 3-4 hotword samples are collected directly from the user This repository is an official implemenation of EfficientWord-Net as a python library from the authors.

The library is purely written with python and uses Google's Tflite implemenation for faster realtime inference.

Demo of EfficientWord-Net in Pi

EfficientWord-Net.mp4

Access preprint

The research paper is currently under review in IEEE, click here to access the preprint and the training code will be available for public access once the paper is published.

Python Version Requirements

This Library works between python versions: 3.6 to 3.9

Dependencies Installation

Before running the pip installation command for the library, few dependencies need to be installed manually.

tflite package cannot be listed in requirements.txt hence will be automatically installed when the package is initialized in the system.

librosa package is not required for inference only cases , however when generate_reference is called , will be automatically installed.


Package Installation

Run the following pip command

pip install EfficientWord-Net

and to import running

import eff_word_net

Demo

After installing the packages, you can run the Demo script inbuilt with library (ensure you have a working mic).

Accesss Documentation from : https://ant-brain.github.io/EfficientWord-Net/

Command to run demo

python -m eff_word_net.engine

Generating Custom Wakewords

For any new hotword, the library needs information about the hotword, this information is obtained from a file called {wakeword}_ref.json. Eg: For the wakeword 'alexa', the library would need the file called alexa_ref.json

These files can be generated with the following procedure:

One needs to collect few 4 to 10 uniquely sounding pronunciations of a given wakeword. Then put them into a seperate folder, which doesnt contain anything else.

Finally run this command, it will ask for the input folder's location (containing the audio files) and the output folder (where _ref.json file will be stored).

python -m eff_word_net.generate_reference

The pathname of the generated wakeword needs to passed to the HotwordDetector detector instance.

HotwordDetector(
        hotword="hello",
        reference_file = "/full/path/name/of/hello_ref.json"),
        activation_count = 3 #2 by default
)

Few wakewords such as Mycroft, Google, Firefox, Alexa, Mobile, Siri the library has predefined embeddings readily available in the library installation directory, its path is readily available in the following variable

from eff_word_net import samples_loc

Try your first single hotword detection script

import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net.engine import HotwordDetector
from eff_word_net import samples_loc

mycroft_hw = HotwordDetector(
        hotword="Mycroft",
        reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
        activation_count=3
    )

mic_stream = SimpleMicStream()
mic_stream.start_stream()

print("Say Mycroft ")
while True :
    frame = mic_stream.getFrame()
    result = mycroft_hw.checkFrame(frame)
    if(result):
        print("Wakeword uttered")

Detecting Mulitple Hotwords from audio streams

The library provides a computation friendly way to detect multiple hotwords from a given stream, installed of running checkFrame() of each wakeword individually

import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net import samples_loc
print(samples_loc)

alexa_hw = HotwordDetector(
        hotword="Alexa",
        reference_file = os.path.join(samples_loc,"alexa_ref.json"),
    )

siri_hw = HotwordDetector(
        hotword="Siri",
        reference_file = os.path.join(samples_loc,"siri_ref.json"),
    )

mycroft_hw = HotwordDetector(
        hotword="mycroft",
        reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
        activation_count=3
    )

multi_hw_engine = MultiHotwordDetector(
        detector_collection = [
            alexa_hw,
            siri_hw,
            mycroft_hw,
        ],
    )

mic_stream = SimpleMicStream()
mic_stream.start_stream()

print("Say Mycroft / Alexa / Siri")

while True :
    frame = mic_stream.getFrame()
    result = multi_hw_engine.findBestMatch(frame)
    if(None not in result):
        print(result[0],f",Confidence {result[1]:0.4f}")

Access documentation of the library from here : https://ant-brain.github.io/EfficientWord-Net/

About activation_count in HotwordDetector

Documenatation with detailed explanation on the usage of activation_count parameter in HotwordDetector is in the making , For now understand that for long hotwords 3 is advisable and 2 for smaller hotwords. If the detector gives out multiple triggers for a single utterance, try increasing activation_count. To experiment begin with smaller values. Default value for the same is 2

FAQ :

  • Hotword Perfomance is bad : if you are having some issue like this , feel to ask the same in discussions

CONTRIBUTION:

  • If you have an ideas to make the project better, feel free to ping us in discussions
  • The current logmelcalc.tflite graph can convert only 1 audio frame to Log Mel Spectrogram at a time. It will be of a great help if tensorflow guru's outthere help us out with this.

TODO :

  • Add audio file handler in streams. PR's are welcome.
  • Remove librosa requirement to encourage generating reference files directly in edge devices
  • Add more detailed documentation explaining slider window concept

SUPPORT US:

Our hotword detector's performance is notably low when compared to Porcupine. We have thought about better NN architectures for the engine and hope to outperform Porcupine. This has been our undergrad project. Hence your support and encouragement will motivate us to develop the engine. If you loved this project recommend this to your peers, give us a 🌟 in Github and a clap 👏 in medium.

LICENCSE : Apache License 2.0

Comments
  • Threshold value in engine.py not working?

    Threshold value in engine.py not working?

    hello,

    first of all, thank you for this great library!

    I managed to make it work on my M1 MacBook Air, and trying out my personal hotword detection, but the threshold value does not seem to be working on my environment.

    In engine.py:

        def __init__(
                self,
                hotword:str,
                reference_file:str,
                threshold:float=0.995,
                activation_count=2,
                continuous=True,
                verbose = False):
    

    And this is my script:

    import os
    from eff_word_net.streams import SimpleMicStream
    from eff_word_net.engine import HotwordDetector
    from eff_word_net import samples_loc
    
    hotword_hw = HotwordDetector(
            hotword="hotword",
            reference_file = "hotword_ref.json",
            activation_count=3
        )
    
    mic_stream = SimpleMicStream()
    mic_stream.start_stream()
    
    print("Say hotword ")
    while True :
        frame = mic_stream.getFrame()
        result = hotword_hw.checkFrame(frame)
            print("Wakeword uttered")
            print(hotword_hw.getMatchScoreFrame(frame))
    

    and when I run this script, the checkFrame returns true even when the getMatchScoreFrame returns under the threshold, like:

    Wakeword uttered
    0.9371609374494279
    Wakeword uttered
    0.9164050520717595
    Wakeword uttered
    0.9082509350226378
    ...
    

    Could you please take a look at this?

    Thank you!

    opened by dominickchen 10
  • Hotword detection triggers the moment any sound is being playd, even with the default models

    Hotword detection triggers the moment any sound is being playd, even with the default models

    So I've been trying to make a custom hotwork. But after seeing it trigger all the time, the moment any kind of sound is being recorded, I decided to use a default one, like "brightness", "mobile", "google", etc.

    They all trigger immediatley. Using the default values for the HotWordDetector, by the way. Any clue why? It seemed to have worked great in your video presentation.

    Not using a cheap ass microphone by the way.

    opened by TrackLab 9
  • circuit diagram

    circuit diagram

    Good evening!Can you send me the circuit diagram of raspberry pie connecting the bread board and lighting the LED light? Your experiment is so interesting that I want to repeat it.

    documentation 
    opened by preachwebsite 5
  • Invalid sample rate

    Invalid sample rate

    ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.front.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM front ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround21 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround21 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround40.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround40 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround41 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround50 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround51 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround71.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround71 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM iec958 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM spdif ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM spdif ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_lavrate.so (libasound_module_rate_lavrate.so: libasound_module_rate_lavrate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_samplerate.so (libasound_module_rate_samplerate.so: libasound_module_rate_samplerate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_speexrate.so (libasound_module_rate_speexrate.so: libasound_module_rate_speexrate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_jack.so (libasound_module_pcm_jack.so: libasound_module_pcm_jack.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_jack.so (libasound_module_pcm_jack.so: libasound_module_pcm_jack.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_oss.so (libasound_module_pcm_oss.so: libasound_module_pcm_oss.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_oss.so (libasound_module_pcm_oss.so: libasound_module_pcm_oss.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_pulse.so (libasound_module_pcm_pulse.so: libasound_module_pcm_pulse.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_pulse.so (libasound_module_pcm_pulse.so: libasound_module_pcm_pulse.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_a52.so (libasound_module_pcm_a52.so: libasound_module_pcm_a52.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_a52.so (libasound_module_pcm_a52.so: libasound_module_pcm_a52.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_upmix.so (libasound_module_pcm_upmix.so: libasound_module_pcm_upmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_upmix.so (libasound_module_pcm_upmix.so: libasound_module_pcm_upmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_vdownmix.so (libasound_module_pcm_vdownmix.so: libasound_module_pcm_vdownmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_vdownmix.so (libasound_module_pcm_vdownmix.so: libasound_module_pcm_vdownmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_usb_stream.so (libasound_module_pcm_usb_stream.so: libasound_module_pcm_usb_stream.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_usb_stream.so (libasound_module_pcm_usb_stream.so: libasound_module_pcm_usb_stream.so: cannot open shared object file: No such file or directory) Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2043 Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2713 Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2837 Traceback (most recent call last): File "/home/pi/Documents/test.py", line 11, in mic_stream = SimpleMicStream() File "/home/pi/Documents/eff_word_net/streams.py", line 71, in init mic_stream=p.open( File "/home/pi/Downloads/yes/envs/eff/lib/python3.8/site-packages/pyaudio.py", line 750, in open stream = Stream(self, *args, **kwargs) File "/home/pi/Downloads/yes/envs/eff/lib/python3.8/site-packages/pyaudio.py", line 441, in init self._stream = pa.open(**arguments) OSError: [Errno -9997] Invalid sample rate

    Good evening!I have encountered this problem. How can I solve it?Looking forward to your reply.

    raspberrypi 
    opened by preachwebsite 3
  • Could you help me?

    Could you help me?

    I won't deploy its running environment. Can you control it remotely? I have TeamViewer, a remote control software. The ID is 621 081 831. Or use other remote control. We look forward to your help.

    opened by preachwebsite 3
  • raising precision of custom wakeword

    raising precision of custom wakeword

    I'm curious whether the precision of custom wakeword improves if you provide more sound files, e.g. 50 files from different people? or is that meaningless?

    We want to use a custom wakeword for a public interaction system, and want it to recognize voice input from a wide range of people (young&old, male&female, etc).

    Thank you for letting me know.

    opened by dominickchen 3
  • Invalid input device (no default output device)

    Invalid input device (no default output device)

    ALSA lib conf.c:3723:(snd_config_hooks_call) Cannot open shared library libasound_module_conf_pulse.so (libasound_module_conf_pulse.so: libasound_module_conf_pulse.so: cannot open shared object file: No such file or directory) ALSA lib control.c:1379:(snd_ctl_open_noupdate) Invalid CTL hw:0 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.front.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM front ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround40 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround41 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround50 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround51 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround71 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM iec958 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM spdif ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM spdif ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_lavrate.so (libasound_module_rate_lavrate.so: libasound_module_rate_lavrate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_samplerate.so (libasound_module_rate_samplerate.so: libasound_module_rate_samplerate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_speexrate.so (libasound_module_rate_speexrate.so: libasound_module_rate_speexrate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_jack.so (libasound_module_pcm_jack.so: libasound_module_pcm_jack.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_jack.so (libasound_module_pcm_jack.so: libasound_module_pcm_jack.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_oss.so (libasound_module_pcm_oss.so: libasound_module_pcm_oss.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_oss.so (libasound_module_pcm_oss.so: libasound_module_pcm_oss.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_pulse.so (libasound_module_pcm_pulse.so: libasound_module_pcm_pulse.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_pulse.so (libasound_module_pcm_pulse.so: libasound_module_pcm_pulse.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_a52.so (libasound_module_pcm_a52.so: libasound_module_pcm_a52.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_a52.so (libasound_module_pcm_a52.so: libasound_module_pcm_a52.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_upmix.so (libasound_module_pcm_upmix.so: libasound_module_pcm_upmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_upmix.so (libasound_module_pcm_upmix.so: libasound_module_pcm_upmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_vdownmix.so (libasound_module_pcm_vdownmix.so: libasound_module_pcm_vdownmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_vdownmix.so (libasound_module_pcm_vdownmix.so: libasound_module_pcm_vdownmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_usb_stream.so (libasound_module_pcm_usb_stream.so: libasound_module_pcm_usb_stream.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_usb_stream.so (libasound_module_pcm_usb_stream.so: libasound_module_pcm_usb_stream.so: cannot open shared object file: No such file or directory) Traceback (most recent call last): File "/home/pi/Documents/test.py", line 11, in mic_stream = SimpleMicStream() File "/home/pi/Documents/eff_word_net/streams.py", line 71, in init mic_stream=p.open( File "/home/pi/Downloads/yes/envs/eff/lib/python3.8/site-packages/pyaudio.py", line 750, in open stream = Stream(self, *args, **kwargs) File "/home/pi/Downloads/yes/envs/eff/lib/python3.8/site-packages/pyaudio.py", line 441, in init self._stream = pa.open(**arguments) OSError: [Errno -9996] Invalid input device (no default output device)

    I have encountered this problem. How can I solve it?

    opened by preachwebsite 2
  • Here that working fine with ref file but not if a record custom file.

    Here that working fine with ref file but not if a record custom file.

    Hello, i working on google collab, so i don't have access to mic. The work around is to used mp3 or wav file. To do that i have add this class:

    from streams import CustomAudioStream
    from pydub import AudioSegment
    
    import numpy as np
    import wave
    
    RATE = 16000
    index = 0
    
    class SimpleFileStream(CustomAudioStream) :
    
        def open_stream(self, src, mp3):
            if mp3:
              dst = "Data/sample.wav"
              # convert mp3 to wav              
              sound = AudioSegment.from_mp3(src).set_frame_rate(16000)
              sound.export(dst, format="wav")
              self.wf = wave.open(dst, 'rb')
            else:
              print("Not an mp3")
              self.wf = wave.open(src, 'rb')
              self.wf.rewind()
            print("Get params of wav file " + str(self.wf.getparams()))
    
        def close_stream(self):
            self.wf.close()
    
        def get_next_frame(self):
            global index
            print("Index ", index)
            index = index + self.CHUNK
            return np.frombuffer(self.wf.readframes(self.CHUNK),dtype=np.int16)
    
        """
        Implements stream with sliding window, 
        implemented by inheriting CustomAudioStream
        """
        def __init__(self,sliding_window_secs:float=1/8):
            self.CHUNK = int(sliding_window_secs*RATE)
    
            CustomAudioStream.__init__(
                self,
                open_stream = self.open_stream,
                close_stream = self.close_stream,
                get_next_frame = self.get_next_frame,
            )
    

    It seems working if i used ref file of github. But if i record a custom file using audacity it is not detect the wakeword.

    If i change the threshold to 0.7 and the activation count to 2 it is work better, but il will increase the chance of getting false positive.

    Is it mandatory to have custom ref for each user ?

    Best regards Sebastien

    opened by warichet 2
  • Bump numpy from 1.20.0 to 1.22.0

    Bump numpy from 1.20.0 to 1.22.0

    Bumps numpy from 1.20.0 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Discussion : Hotword's accuracy too low

    Discussion : Hotword's accuracy too low

    If you are playing around with using your own custom hotwords and some hotword happen to not work so good Feel free to use the thread in discussions https://github.com/Ant-Brain/EfficientWord-Net/discussions/4

    opened by TheSeriousProgrammer 0
  • Hotword matches without any utterance

    Hotword matches without any utterance

    Hi, first of all thanks for making this library, Its fantastic!! I understand well since its in very early phase so it will have some issue and eventually it will better. So this time I was trying to go with the given example of hotword detector, I tried to attach a speech recognition after hotword triggers, but the performance is quite messy , to demonstrate this I am including this gif.

    Code_nyP9cJMBBg

    Problem1: Basically what happening is I am trying to call speech recognition right after there is match, as the speech recognition ends it again shows hotword uttered and re listen, even though there no hotword uttered and with confidence.

    Problem2: Also in some situations it matches when there is little click or desk sound.

    any fix for at least for Problem 1 I see problem 2 could be the reason of weak training as depending upon the hotword.

    opened by OnlinePage 2
  • OSError: [Errno -9981] Input overflowed

    OSError: [Errno -9981] Input overflowed

    I've installed the python library with

    pip install EfficientWord-Net

    onto a raspberry pi 2 with recent raspbian lite.

    However if i run the demo with python -m eff_word_net.engine i'll get the following error and nothing works:

    Say Mycroft / Alexa / Siri
    Traceback (most recent call last):
      File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/home/max/.local/lib/python3.9/site-packages/eff_word_net/engine.py", line 333, in <module>
        frame = mic_stream.getFrame()
      File "/home/max/.local/lib/python3.9/site-packages/eff_word_net/streams.py", line 49, in getFrame
        new_frame = self._get_next_frame()
      File "/home/max/.local/lib/python3.9/site-packages/eff_word_net/streams.py", line 85, in <lambda>
        np.frombuffer(mic_stream.read(CHUNK),dtype=np.int16)
      File "/usr/lib/python3/dist-packages/pyaudio.py", line 608, in read
        return pa.read_stream(self._stream, num_frames, exception_on_overflow)
    OSError: [Errno -9981] Input overflowed
    

    any idea?

    opened by mKenfenheuer 1
  • complex hotwords support #Current Model Limitations Discussion

    complex hotwords support #Current Model Limitations Discussion

    Hi, Thanks for your helpful research. I wonder if the current model can handle complex hot words like "Hey Siri" or just handle one word, like "Siri"?

    My second question is about hot words that their pronunciation takes more than 1s, like"Hey XXXX." Does your model support changing the recording time?

    Did you try to use cosine_similarity instead of Euclidian distance in inference time?

    Thanks.

    enhancement 
    opened by amoazeni75 7
  • Problem with Dependencies #Docker Support

    Problem with Dependencies #Docker Support

    Hello I left a comment on Reddit saying I would give it a go, and you said if I had a problem to log it here, so here I am, with a problem 😊

    I seem to get stuck with pip3 install librosa I get this error Failed building wheel for llvmlite Running setup.py clean for llvmlite Failed to build llvmlite I can push on and get EfficientWord installed and working, if I say Alexa it says Yup I hear ya

    The problem is then when I try to create my own wake word I run this command … python3 -m eff_word_net.generate_reference [email protected]:~ $ python3 -m eff_word_net.generate_reference Paste Path of folder Containing audio files:/home/pi/wakewords Paste Path of location to save *_ref.json :/home/pi/wakewords Enter Wakeword Name :bender Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in run_code exec(code, run_globals) File "/home/pi/.local/lib/python3.7/site-packages/eff_word_net/generate_reference.py", line 80, in input("Enter Wakeword Name :") File "/home/pi/.local/lib/python3.7/site-packages/eff_word_net/generate_reference.py", line 47, in generate_reference_file x, = librosa.load(audio_file,sr=16000) AttributeError: module 'librosa' has no attribute 'load'

    My Problem is with librosa, I am not able to install it. I tried everything I could google but it will never install

    How did you get around this problem ?

    enhancement good first issue raspberrypi wake_word_generation 
    opened by Balro76 3
Releases(stable)
Nested cross-validation is necessary to avoid biased model performance in embedded feature selection in high-dimensional data with tiny sample sizes

Pruner for nested cross-validation - Sphinx-Doc Nested cross-validation is necessary to avoid biased model performance in embedded feature selection i

1 Dec 15, 2021
Deep generative models of 3D grids for structure-based drug discovery

What is liGAN? liGAN is a research codebase for training and evaluating deep generative models for de novo drug design based on 3D atomic density grid

Matt Ragoza 152 Jan 03, 2023
S-attack library. Official implementation of two papers "Are socially-aware trajectory prediction models really socially-aware?" and "Vehicle trajectory prediction works, but not everywhere".

S-attack library: A library for evaluating trajectory prediction models This library contains two research projects to assess the trajectory predictio

VITA lab at EPFL 71 Jan 04, 2023
Multiple style transfer via variational autoencoder

ST-VAE Multiple style transfer via variational autoencoder By Zhi-Song Liu, Vicky Kalogeiton and Marie-Paule Cani This repo only provides simple testi

13 Oct 29, 2022
Simple embedding based text classifier inspired by fastText, implemented in tensorflow

FastText in Tensorflow This project is based on the ideas in Facebook's FastText but implemented in Tensorflow. However, it is not an exact replica of

Alan Patterson 306 Dec 02, 2022
Normalization Matters in Weakly Supervised Object Localization (ICCV 2021)

Normalization Matters in Weakly Supervised Object Localization (ICCV 2021) 99% of the code in this repository originates from this link. ICCV 2021 pap

Jeesoo Kim 10 Feb 01, 2022
Code of our paper "Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning"

CCOP Code of our paper Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning Requirement Install OpenSelfSup Install Detectron2

Chenhongyi Yang 21 Dec 13, 2022
Voice Conversion Using Speech-to-Speech Neuro-Style Transfer

This repo contains the official implementation of the VAE-GAN from the INTERSPEECH 2020 paper Voice Conversion Using Speech-to-Speech Neuro-Style Transfer.

Ehab AlBadawy 93 Jan 05, 2023
Bu repo SAHI uygulamasını mantığını öğreniyoruz.

SAHI-Learn: SAHI'den Beraber Kodlamak İster Misiniz Herkese merhabalar ben Kadir Nar. SAHI kütüphanesine gönüllü geliştiriciyim. Bu repo SAHI kütüphan

Kadir Nar 11 Aug 22, 2022
Official implementation of Deep Convolutional Dictionary Learning for Image Denoising.

DCDicL for Image Denoising Hongyi Zheng*, Hongwei Yong*, Lei Zhang, "Deep Convolutional Dictionary Learning for Image Denoising," in CVPR 2021. (* Equ

Z80 91 Dec 21, 2022
Generating Radiology Reports via Memory-driven Transformer

R2Gen This is the implementation of Generating Radiology Reports via Memory-driven Transformer at EMNLP-2020. Citations If you use or extend our work,

CUHK-SZ NLP Group 101 Dec 13, 2022
Mmdet benchmark with python

mmdet_benchmark 本项目是为了研究 mmdet 推断性能瓶颈,并且对其进行优化。 配置与环境 机器配置 CPU:Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz GPU:NVIDIA GeForce RTX 3080 10GB 内存:64G 硬盘:1T

杨培文 (Yang Peiwen) 24 May 21, 2022
IOT: Instance-wise Layer Reordering for Transformer Structures

Introduction This repository contains the code for Instance-wise Ordered Transformer (IOT), which is introduced in the ICLR2021 paper IOT: Instance-wi

IOT 19 Nov 15, 2022
JumpDiff: Non-parametric estimator for Jump-diffusion processes for Python

jumpdiff jumpdiff is a python library with non-parametric Nadaraya─Watson estimators to extract the parameters of jump-diffusion processes. With jumpd

Rydin 28 Dec 10, 2022
[NeurIPS 2020] Official Implementation: "SMYRF: Efficient Attention using Asymmetric Clustering".

SMYRF: Efficient attention using asymmetric clustering Get started: Abstract We propose a novel type of balanced clustering algorithm to approximate a

Giannis Daras 46 Dec 22, 2022
This repository is the official implementation of Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models

Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models Link to paper Abstract We study prediction of future out

Rickard Karlsson 2 Aug 19, 2022
The code for the NeurIPS 2021 paper "A Unified View of cGANs with and without Classifiers".

Energy-based Conditional Generative Adversarial Network (ECGAN) This is the code for the NeurIPS 2021 paper "A Unified View of cGANs with and without

sianchen 22 May 28, 2022
Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

CenterPose Overview This repository is the official implementation of the paper "Single-stage Keypoint-based Category-level Object Pose Estimation fro

NVIDIA Research Projects 188 Dec 27, 2022
Use .csv files to record, play and evaluate motion capture data.

Purpose These scripts allow you to record mocap data to, and play from .csv files. This approach facilitates parsing of body movement data in statisti

21 Dec 12, 2022
An NVDA add-on to split screen reader and audio from other programs to different sound channels

An NVDA add-on to split screen reader and audio from other programs to different sound channels (add-on idea credit: Tony Malykh)

Joseph Lee 7 Dec 25, 2022