On-device speech-to-index engine powered by deep learning.

Overview

Octopus

GitHub

PyPI Maven Central Cocoapods

Made in Vancouver, Canada by Picovoice

Twitter URL YouTube Channel Views

Octopus is Picovoice's Speech-to-Index engine. It directly indexes speech without relying on a text representation. This acoustic-only approach boosts accuracy by removing out-of-vocabulary limitation and eliminating the problem of competing hypothesis (e.g. homophones)

Table of Contents

Demos

Python Demos

Install the demo package:

sudo pip3 install pvoctopusdemo

Run the following in the terminal:

octopus_demo  --access_key {AccessKey} --audio_paths ${AUDIO_PATHS}

Replace ${AccessKey} with your AccessKey obtained from Picovoice Console and ${AUDIO_PATHS} with a space-separated list of audio files. Octopus starts processing the audio files and asks you for search phrases and shows results interactively.

For more information about the Python demos go to demo/python.

C Demos

Build the demo:

cmake -S demo/c/ -B demo/c/build && cmake --build demo/c/build

Index a given audio file:

./demo/c/build/octopus_index_demo ${LIBRARY_PATH} ${ACCESS_KEY} ${AUDIO_PATH} ${INDEX_PATH}

Then search the index for a given phrase:

./demo/c/build/octopus_search_demo ${LIBRARY_PATH} ${MODEL_PATH} ${ACCESS_KEY} ${INDEX_PATH} ${SEARCH_PHRASE}

Replace ${LIBRARY_PATH} with path to appropriate library available under lib, ${ACCESS_KEY} with AccessKey obtained from Picovoice Console, ${AUDIO_PATH} with the path to a given audio file and format, ${INDEX_PATH} with the path to cached index file and ${SEARCH_PHRASE} to a search phrase.

For more information about C demos go to demo/c.

Android Demos

Using Android Studio, open demo/android/OctopusDemo as an Android project.

Replace "${YOUR_ACCESS_KEY_HERE}" inside MainActivity.java with your AccessKey obtained from Picovoice Console. Then run the demo.

For more information about Android demos go to demo/android.

iOS Demos

From the demo/ios/OctopusDemo, run the following to install the Octopus CocoaPod:

pod install

Replace "{YOUR_ACCESS_KEY_HERE}" inside ViewModel.swift with your AccessKey obtained from Picovoice Console. Then, using Xcode, open the generated OctopusDemo.xcworkspace and run the application.

For more information about iOS demos go to demo/ios.

Web Demos

From demo/web run the following in the terminal:

yarn
yarn start

(or)

npm install
npm run start

Open http://localhost:5000 in your browser to try the demo.

SDKs

Python

Create an instance of the engine:

import pvoctopus
access_key = ""  # AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
handle = pvoctopus.create(access_key=access_key)

Index your raw audio data or file:

audio_data = [..]
metadata = handle.index(audio_data)
# or 
audio_file_path = "/path/to/my/audiofile.wav"
metadata = handle.index_file(audio_file_path)

Then search the metadata for phrases:

{match.end_sec} ({match.probablity})") ">
avocado_matches = matches['avocado']
for match in avocado_matches:
    print(f"Match for `avocado`: {match.start_sec} -> {match.end_sec} ({match.probablity})")

When done the handle resources have to be released explicitly:

handle.delete()

C

pv_octopus.h header file contains relevant information. Build an instance of the object:

    const char *model_path = "..."; // absolute path to the model file available at `lib/common/octopus_params.pv`
    const char *access_key = "..." // AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
    pv_octopus_t *handle = NULL;
    pv_status_t status = pv_octopus_init(access_key, model_path, &handle);
    if (status != PV_STATUS_SUCCESS) {
        // error handling logic
    }

Index audio data using constructed object:

const char *audio_path = "..."; // absolute path to the audio file to be indexed
void *indices = NULL;
int32_t num_indices_bytes = 0;
pv_status_t status = pv_octopus_index_file(handle, audio_path, &indices, &num_indices_bytes);
if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

Search the indexed data:

const char *phrase = "...";
pv_octopus_match_t *matches = NULL;
int32_t num_matches = 0;
pv_status_t status = pv_octopus_search(handle, indices, num_indices_bytes, phrase, &matches, &num_matches);
if (status != PV_STATUS_SUCCESS) {
    // error handling logic
}

When done be sure to release the acquired resources:

pv_octopus_delete(handle);

Android

Create an instance of the engine:

import ai.picovoice.octopus.*;

final String accessKey = "..."; // AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
try {
    Octopus handle = new Octopus.Builder(accessKey).build(appContext);
} catch (OctopusException ex) { }

Index audio data using constructed object:

final String audioFilePath = "/path/to/my/audiofile.wav"
try {
    OctopusMetadata metadata = handle.indexAudioFile(audioFilePath);
} catch (OctopusException ex) { }

Search the indexed data:

HashMap <String, OctopusMatch[]> matches = handle.search(metadata, phrases);

for (Map.Entry<String, OctopusMatch[]> entry : map.entrySet()) {
    final String phrase = entry.getKey();
    for (OctopusMatch phraseMatch : entry.getValue()){
        final float startSec = phraseMatch.getStartSec();
        final float endSec = phraseMatch.getEndSec();
        final float probability = phraseMatch.getProbability();
    }
}

When done be sure to release the acquired resources:

metadata.delete();
handle.delete();

iOS

Create an instance of the engine:

import Octopus

let accessKey : String = // .. AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
do {
    let handle = try Octopus(accessKey: accessKey)
} catch { }

Index audio data using constructed object:

let audioFilePath = "/path/to/my/audiofile.wav"
do {
    let metadata = try handle.indexAudioFile(path: audioFilePath)
} catch { }

Search the indexed data:

let matches: Dictionary<String, [OctopusMatch]> = try octopus.search(metadata: metadata, phrases: phrases)
for (phrase, phraseMatches) in matches {
    for phraseMatch in phraseMatches {
        var startSec = phraseMatch.startSec;
        var endSec = phraseMatch.endSec;
        var probability = phraseMatch.probability;
    }
}

When done be sure to release the acquired resources:

handle.delete();

Web

Octopus is available on modern web browsers (i.e., not Internet Explorer) via WebAssembly. Octopus is provided pre-packaged as a Web Worker to allow it to perform processing off the main thread.

Vanilla JavaScript and HTML (CDN Script Tag)

">
>
<html lang="en">

<head>
  <script src="https://unpkg.com/@picovoice/octopus-web-en-worker/dist/iife/index.js">script>
  <script type="application/javascript">
    // The metadata object to save the result of indexing for later searches
    let octopusMetadata = undefined

    function octopusIndexCallback(metadata) {
      octopusMetadata = metadata
    }

    function octopusSearchCallback(matches) {
      console.log(`Search results (${matches.length}):`)
      console.log(`Start: ${match.startSec}s -> End: ${match.endSec}s (Probability: ${match.probability})`)
    }

    async function startOctopus() {
      // Create an Octopus Worker
      // Note: you receive a Worker object, _not_ an individual Octopus instance
      const accessKey = ... // AccessKey string provided by Picovoice Console (https://picovoice.ai/console/)
      const OctopusWorker = await OctopusWorkerFactory.create(
        accessKey,
        octopusIndexCallback,
        octopusSearchCallback
      )
    }

    document.addEventListener("DOMContentLoaded", function () {
      startOctopus();
      // Send Octopus the audio signal
      const audioSignal = new Int16Array(/* Provide data with correct format*/)
      OctopusWorker.postMessage({
        command: "index",
        input: audioSignal,
      });
    });

    const searchText = ...
    OctopusWorker.postMessage({
      command: "search",
      metadata: octopusMetadata,
      searchPhrase: searchText,
    });
  script>
head>

<body>body>

html>

Vanilla JavaScript and HTML (ES Modules)

yarn add @picovoice/octopus-web-en-worker

(or)

npm install @picovoice/octopus-web-en-worker
End: ${match.endSec}s (Probability: ${match.probability})`); } async function startOctopus() { // Create an Octopus Worker // Note: you receive a Worker object, _not_ an individual Octopus instance const accessKey = // .. AccessKey provided by Picovoice Console (https://picovoice.ai/console/) const OctopusWorker = await OctopusWorkerFactory.create( accessKey, octopusIndexCallback, octopusSearchCallback ); } startOctopus() ... // Send Octopus the audio signal const audioSignal = new Int16Array(/* Provide data with correct format*/) OctopusWorker.postMessage({ command: "index", input: audioSignal, }); ... const searchText = ...; OctopusWorker.postMessage({ command: "search", metadata: octopusMetadata, searchPhrase: searchText, }); ">
import { OctopusWebEnWorker } from "@picovoice/octopus-web-en-worker";

// The metadata object to save the result of indexing for later searches
let octopusMetadata = undefined;

function octopusIndexCallback(metadata) {
  octopusMetadata = metadata;
}

function octopusSearchCallback(matches) {
  console.log(`Search results (${matches.length}):`);
  console.log(`Start: ${match.startSec}s -> End: ${match.endSec}s (Probability: ${match.probability})`);
}


async function startOctopus() {
  // Create an Octopus Worker
  // Note: you receive a Worker object, _not_ an individual Octopus instance
  const accessKey = // .. AccessKey provided by Picovoice Console (https://picovoice.ai/console/)
  const OctopusWorker = await OctopusWorkerFactory.create(
    accessKey,
    octopusIndexCallback,
    octopusSearchCallback
  );
}

startOctopus()

...

// Send Octopus the audio signal
const audioSignal = new Int16Array(/* Provide data with correct format*/)
OctopusWorker.postMessage({
  command: "index",
  input: audioSignal,
});

...

const searchText = ...;
OctopusWorker.postMessage({
  command: "search",
  metadata: octopusMetadata,
  searchPhrase: searchText,
});

Releases

v1.0.0 Oct 8th, 2021

  • Initial release.
You might also like...
A python script to lookup Passport Index Dataset

visa-cli A python script to lookup Passport Index Dataset Installation pip install visa-cli Usage usage: visa-cli [-h] [-d DESTINATION_COUNTRY] [-f]

This is a virtual picture dragging application. Users may virtually slide photos across the screen. The distance between the index and middle fingers determines the movement. Smaller distances indicate click and motion, whereas bigger distances indicate only hand movement.
A set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI.

Overview This is a set of simple scripts to process the Imagenet-1K dataset as TFRecords and make index files for NVIDIA DALI. Make TFRecords To run t

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing
PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

This repository is used to suspend the results of our paper "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement"

Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis"

StrengthNet Implementation of "StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis" https://arxiv.org/abs/2110

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC

A fast, dataset-agnostic, deep visual search engine for digital art history

imgs.ai imgs.ai is a fast, dataset-agnostic, deep visual search engine for digital art history based on neural network embeddings. It utilizes modern

Comments
  • Bump terser from 5.13.1 to 5.16.1 in /binding/web

    Bump terser from 5.13.1 to 5.16.1 in /binding/web

    Bumps terser from 5.13.1 to 5.16.1.

    Changelog

    Sourced from terser's changelog.

    v5.16.1

    • Properly handle references in destructurings (const { [reference]: val } = ...)
    • Allow parsing of .#privatefield in nested classes
    • Do not evaluate operations that return large strings if that would make the output code larger
    • Make collapse_vars handle block scope correctly
    • Internal improvements: Typos (#1311), more tests, small-scale refactoring

    v5.16.0

    • Disallow private fields in object bodies (#1011)
    • Parse #privatefield in object (#1279)
    • Compress #privatefield in object

    v5.15.1

    • Fixed missing parentheses around optional chains
    • Avoid bare let or const as the bodies of if statements (#1253)
    • Small internal fixes (#1271)
    • Avoid inlining a class twice and creating two equivalent but !== classes.

    v5.15.0

    • Basic support for ES2022 class static initializer blocks.
    • Add AudioWorkletNode constructor options to domprops list (#1230)
    • Make identity function inliner not inline id(...expandedArgs)

    v5.14.2

    • Security fix for RegExps that should not be evaluated (regexp DDOS)
    • Source maps improvements (#1211)
    • Performance improvements in long property access evaluation (#1213)

    v5.14.1

    • keep_numbers option added to TypeScript defs (#1208)
    • Fixed parsing of nested template strings (#1204)

    v5.14.0

    • Switched to @​jridgewell/source-map for sourcemap generation (#1190, #1181)
    • Fixed source maps with non-terminated segments (#1106)
    • Enabled typescript types to be imported from the package (#1194)
    • Extra DOM props have been added (#1191)
    • Delete the AST while generating code, as a means to save RAM
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 1
  • Convert c owned memory into ctypes owned memory for metadata objects

    Convert c owned memory into ctypes owned memory for metadata objects

    ctypes doesn't free the c_void_p from c, but if we convert it into a block of memory created from ctypes (serialize -> deserialize), then it will garbage collect and free when appropriate.

    opened by ErisMik 1
Releases(v1.2)
  • v1.2(Aug 12, 2022)

Owner
Picovoice
Edge Voice AI Platform
Picovoice
Code repo for "Transformer on a Diet" paper

Transformer on a Diet Reference: C Wang, Z Ye, A Zhang, Z Zhang, A Smola. "Transformer on a Diet". arXiv preprint arXiv (2020). Installation pip insta

cgraywang 31 Sep 26, 2021
nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures. Here you will find the scripts necessary to produce th

Jesse Willis 0 Jan 20, 2022
CrossMLP - The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.

CrossMLP Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation Bin Ren1, Hao Tang2, Nicu Sebe1. 1University of Trento, Italy, 2ETH, Switzerla

Bingoren 16 Jul 27, 2022
Explaining in Style: Training a GAN to explain a classifier in StyleSpace

Explaining in Style: Official TensorFlow Colab Explaining in Style: Training a GAN to explain a classifier in StyleSpace Oran Lang, Yossi Gandelsman,

Google 197 Nov 08, 2022
Creating Multi Task Models With Keras

Creating Multi Task Models With Keras About The Project! I used the keras and Tensorflow Library, To build a Deep Learning Neural Network to Creating

Srajan Chourasia 4 Nov 28, 2022
DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021]

DiscoNet: Learning Distilled Collaboration Graph for Multi-Agent Perception [NeurIPS 2021] Yiming Li, Shunli Ren, Pengxiang Wu, Siheng Chen, Chen Feng

Automation and Intelligence for Civil Engineering (AI4CE) Lab @ NYU 98 Dec 21, 2022
Prevent `CUDA error: out of memory` in just 1 line of code.

🐨 Koila Koila solves CUDA error: out of memory error painlessly. Fix it with just one line of code, and forget it. 🚀 Features 🙅 Prevents CUDA error

RenChu Wang 1.7k Jan 02, 2023
classify fashion-mnist dataset with pytorch

Fashion-Mnist Classifier with PyTorch Inference 1- clone this repository: git clone https://github.com/Jhamed7/Fashion-Mnist-Classifier.git 2- Instal

1 Jan 14, 2022
PyTorch implementation of Progressive Growing of GANs for Improved Quality, Stability, and Variation.

PyTorch implementation of Progressive Growing of GANs for Improved Quality, Stability, and Variation. Warning: the master branch might collapse. To ob

559 Dec 14, 2022
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

Casual GAN Papers 259 Dec 28, 2022
Open AI's Python library

OpenAI Python Library The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language. It incl

Pavan Ananth Sharma 3 Jul 10, 2022
Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties

Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties 8.11.2021 Andrij Vasylenko I

Leverhulme Research Centre for Functional Materials Design 4 Dec 20, 2022
Neural-fractal - Create Fractals Using Complex-Valued Neural Networks!

Neural Fractal Create Fractals Using Complex-Valued Neural Networks! Home Page Features Define Dynamical Systems Using Complex-Valued Neural Networks

Amirabbas Asadi 10 Dec 17, 2022
Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

Pytorch-DPPO Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https

Alexis David Jacq 163 Dec 26, 2022
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

DV Lab 182 Dec 29, 2022
Plotting points that lie on the intersection of the given curves using gradient descent.

Plotting intersection of curves using gradient descent Webapp Link --- What's the app about Why this app Plotting functions and their intersection. A

Divakar Verma 2 Jan 09, 2022
Yolov3 pytorch implementation

YOLOV3 Pytorch实现 在bubbliiing大佬代码的基础上进行了修改,添加了部分注释。 预训练模型 预训练模型来源于bubbliiing。 链接:https://pan.baidu.com/s/1ncREw6Na9ycZptdxiVMApw 提取码:appk 训练自己的数据集 按照VO

4 Aug 27, 2022
UDP++ (ECCVW 2020 Oral), (Winner of COCO 2020 Keypoint Challenge).

UDP-Pose This is the pytorch implementation for UDP++, which won the Fisrt place in COCO Keypoint Challenge at ECCV 2020 Workshop. Top-Down Results on

20 Jul 29, 2022
We have made you a wrapper you can't refuse

We have made you a wrapper you can't refuse We have a vibrant community of developers helping each other in our Telegram group. Join us! Stay tuned fo

20.6k Jan 09, 2023
PlenOctree Extraction algorithm

PlenOctrees_NeRF-SH This is an implementation of the Paper PlenOctrees for Real-time Rendering of Neural Radiance Fields. Not only the code provides t

49 Nov 05, 2022