Weaviate demo with the text2vec-openai module

Overview

Weaviate demo with the text2vec-openai module

This repository contains an example of how to use the Weaviate text2vec-openai module. When using this demo dataset, Weaviate will vectorize the data and the queries based on OpenAI's Babbage model.

What is Weaviate?

Weaviate is an open-source, modular vector search engine. It works like any other database you're used to (it has full CRUD support, it's cloud-native, etc), but it is created around the concept of storing all data objects based on the vector representations (i.e., embeddings) of these data objects. Within Weaviate you can mix traditional, scalar search filters with vector search filters through its GraphQL-API.

Weaviate modules can be used to -among other things- vectorize the data objects you add to Weaviate. In this demo, the text2vec-openai module is used to vectorize all data using OpenAI's Babbage model.

You can read about Weaviate in more detail in the software docs.

About the Dataset

This dataset contains descriptions of 34,886 movies from around the world. The dataset is taken from Kaggle.

Run the setup

Before running this setup, make sure you have an OpenAPI ready, you can create one here.

0. Update you OpenAI API key

$ export OPENAI_APIKEY=YOUR_API_KEY

1. Run the container

Run the container:

$ docker-compose up -d

2. Import the data

After the container starts up, you can import the data by running:

# Install the Weaviate Python client
$ pip3 install -r requirements.txt
# Import the data with the format `./import.py {URL} {OPENAI RATE LIMIT}`
$ ./import.py http://localhost:8080 550

Note: because the OpenAI API comes with a rate limit, we have taken this into account for this demo dataset. If you work with your own dataset and you've requested an increase/removal of your rate limit, you can increase the import speed. You can read here how to do this.

3. Query the data

You can query the data via the GraphQL interface that's available in the Weaviate Console (under "Self Hosted Weaviate").

Or you can test the example queries below.

Example Query

Learn how to use the Get{} function of the Weaviate GraphQL-API here.

{
  Get {
    Movie(
      nearText: {
        concepts: ["Movie about Venice"]
      }
      where: {
        path: ["year"]
        operator: LessThan
        valueInt: 1950
      }
      limit: 5
    ) {
      title
      plot
      year
      director {
        ... on Director {
          name
        }
      }
      genre {
        ... on Genre {
          name
        }
      }
    }
  }
}
Owner
SeMI Technologies
SeMI Technologies creates database software like the Weaviate vector search engine
SeMI Technologies
code for modular summarization work published in ACL2021 by Krishna et al

This repository contains the code for running modular summarization pipelines as described in the publication Krishna K, Khosla K, Bigham J, Lipton ZC

Approximately Correct Machine Intelligence (ACMI) Lab 21 Nov 24, 2022
Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Ubiquitous Knowledge Processing Lab 59 Dec 01, 2022
Synthetic data for the people.

zpy: Synthetic data in Blender. Website • Install • Docs • Examples • CLI • Contribute • Licence Abstract Collecting, labeling, and cleaning data for

Zumo Labs 253 Dec 21, 2022
🦆 Contextually-keyed word vectors

sense2vec: Contextually-keyed word vectors sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detaile

Explosion 1.5k Dec 25, 2022
Natural Language Processing

NLP Natural Language Processing apps Multilingual_NLP.py start #This script is demonstartion of Mul

Ritesh Sharma 1 Oct 31, 2021
Toward a Visual Concept Vocabulary for GAN Latent Space, ICCV 2021

Toward a Visual Concept Vocabulary for GAN Latent Space Code and data from the ICCV 2021 paper Sarah Schwettmann, Evan Hernandez, David Bau, Samuel Kl

Sarah Schwettmann 13 Dec 23, 2022
Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks arXiv link: upcoming To be published in Findings of NA

Allen 16 Nov 12, 2022
Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

UBC Computer Vision Group 358 Dec 24, 2022
🧪 Cutting-edge experimental spaCy components and features

spacy-experimental: Cutting-edge experimental spaCy components and features This package includes experimental components and features for spaCy v3.x,

Explosion 65 Dec 30, 2022
Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

The KLEJ Benchmark Baselines The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language und

Allegro Tech 17 Oct 18, 2022
A Practitioner's Guide to Natural Language Processing

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, Text

Dipanjan (DJ) Sarkar 1.5k Jan 03, 2023
precise iris segmentation

PI-DECODER Introduction PI-DECODER, a decoder structure designed for Precise Iris Segmentation and Location. The decoder structure is shown below: Ple

8 Aug 08, 2022
Contact Extraction with Question Answering.

contactsQA Extraction of contact entities from address blocks and imprints with Extractive Question Answering. Goal Input: Dr. Max Mustermann Hauptstr

Jan 2 Apr 20, 2022
BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

OpenBMB 377 Jan 02, 2023
Yet another Python binding for fastText

pyfasttext Warning! pyfasttext is no longer maintained: use the official Python binding from the fastText repository: https://github.com/facebookresea

Vincent Rasneur 230 Nov 16, 2022
Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

Sentiment Analyzer The goal of this project is to perform sentiment analysis on textual data that people generally post on websites like social networ

Madhusudan.C.S 53 Mar 01, 2022
Bpe algorithm can finetune tokenizer - Bpe algorithm can finetune tokenizer

"# bpe_algorithm_can_finetune_tokenizer" this is an implyment for https://github

张博 1 Feb 02, 2022
The code for two papers: Feedback Transformer and Expire-Span.

transformer-sequential This repo contains the code for two papers: Feedback Transformer Expire-Span The training code is structured for long sequentia

Meta Research 125 Dec 25, 2022
NLP project that works with news (NER, context generation, news trend analytics)

СоАвтор СоАвтор – платформа и открытый набор инструментов для редакций и журналистов-фрилансеров, который призван сделать процесс создания контента ма

38 Jan 04, 2023
Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

This Repository contains a sample code for Tacotron 2, WaveGlow with multi-speaker, emotion embeddings together with a script for data preprocessing.

Ivan Didur 106 Jan 01, 2023