Topic Inference with Zeroshot models

Overview

zeroshot_topics

Table of Contents

Installation

zeroshot_topics is distributed on PyPI as a universal wheel and is available on Linux/macOS and Windows and supports Python 3.7+ and PyPy.

$ pip install zeroshot_topics

Usage

from zeroshot_topics import ZeroShotTopicFinder
zsmodel = ZeroShotTopicFinder()
text = """can you tell me anything else okay great tell me everything you know about George_Washington.
he was the first president he was well he I'm trying to well he fought in the Civil_War he was a general
in the Civil_War and chopped down his father's cherry tree when he was a little boy he that's it."""
zsmodel.find_topic(text)

License

zeroshot_topics is distributed under the terms of

You might also like...
This repo stores the codes for topic modeling on palliative care journals.

This repo stores the codes for topic modeling on palliative care journals. Data Preparation You first need to download the journal papers. bash 1_down

topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API
topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

NLP Space News Topic Modeling Photos by nasa.gov (1, 2, 3, 4, 5) and extremetech.com Table of Contents Project Idea Data acquisition Primary data sour

Biterm Topic Model (BTM): modeling topics in short texts
Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Reduce T5 model size by 3X and increase the inference speed up to 5X. Install Usage Details Functionalities Benchmarks Onnx model Quantized onnx model

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS) Yoonhyung Lee, Joongbo Shin, Kyomin Jung Abstract: Although early

Source code for AAAI20 "Generating Persona Consistent Dialogues by Exploiting Natural Language Inference".

Generating Persona Consistent Dialogues by Exploiting Natural Language Inference Source code for RCDG model in AAAI20 Generating Persona Consistent Di

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation
LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

LightSeq is a high performance inference library for sequence processing and generation implemented in CUDA. It enables highly efficient computation of modern NLP models such as BERT, GPT2, Transformer, etc. It is therefore best useful for Machine Translation, Text Generation, Dialog, Language Modelling, and other related tasks using these models.

Spert NLP Relation Extraction API deployed with torchserve for inference

SpERT torchserve Spert_torchserve is the Relation Extraction model (SpERT)Span-based Entity and Relation Transformer API deployed with pytorch/serve.

A minimal code for fairseq vq-wav2vec model inference.

vq-wav2vec inference A minimal code for fairseq vq-wav2vec model inference. Runs without installing the fairseq toolkit and its dependencies. Usage ex

Comments
  • Error when I run the sample code

    Error when I run the sample code

    I get this when I try to run the sample code:

    Traceback (most recent call last): File "zerotopics.py", line 1, in from zeroshot_topics import ZeroShotTopicFinder File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/init.py", line 3, in from .zeroshot_tm import ZeroShotTopicFinder File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/zeroshot_tm.py", line 3, in from .utils import load_zeroshot_model File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/site-packages/zeroshot_topics/utils.py", line 6, in def load_zeroshot_model(model_name="valhalla/distilbart-mnli-12-6"): File "/Users/scharlesworth/opt/anaconda3/envs/text_analytics/lib/python3.7/functools.py", line 490, in lru_cache raise TypeError('Expected maxsize to be an integer or None') TypeError: Expected maxsize to be an integer or None

    Specifics: Python version 3.7.9

    pip freeze gives (yeh this virtualenv is getting big :):

    absl-py==1.0.0 aiohttp==3.8.1 aiosignal==1.2.0 alabaster==0.7.12 aniso8601==9.0.1 antlr4-python3-runtime==4.8 appnope @ file:///opt/concourse/worker/volumes/live/4f734db2-9ca8-4d8b-5b29-6ca15b4b4772/volume/appnope_1606859466979/work async-timeout==4.0.2 asynctest==0.13.0 attrs==20.3.0 Babel==2.9.1 backcall @ file:///home/ktietz/src/ci/backcall_1611930011877/work bertopic==0.6.0 blis @ file:///opt/concourse/worker/volumes/live/cd6a6bea-d063-4b62-4c10-fcc89b17d0ac/volume/cython-blis_1594246851083/work boto3==1.17.86 botocore==1.20.86 brotlipy==0.7.0 cachetools==4.2.1 catalogue==2.0.6 certifi==2020.12.5 cffi @ file:///opt/concourse/worker/volumes/live/2aa8abfe-8b8d-4889-78d9-837b74c3cd64/volume/cffi_1606255119410/work chardet @ file:///opt/concourse/worker/volumes/live/9efbf151-b45b-463d-6340-a5c399bf00b7/volume/chardet_1607706825988/work charset-normalizer==2.0.9 click==7.1.2 colorama==0.4.4 coloredlogs==15.0.1 commonmark==0.9.1 cryptography @ file:///opt/concourse/worker/volumes/live/41c3d62a-f1f8-46ce-414a-9adaf4ea7d96/volume/cryptography_1607636752064/work cycler==0.10.0 cymem @ file:///opt/concourse/worker/volumes/live/3e8d7428-f57d-4000-44e7-34ac8a744f13/volume/cymem_1605062299053/work Cython==0.29.23 dataclasses==0.6 datasets==1.17.0 decorator @ file:///home/ktietz/src/ci/decorator_1611930055503/work dill==0.3.4 docformatter==1.4 docutils==0.15.2 emoji==1.6.1 en-core-web-lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.2.0/en_core_web_lg-3.2.0-py3-none-any.whl en-core-web-md @ https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.2.0/en_core_web_md-3.2.0-py3-none-any.whl en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.2.0/en_core_web_sm-3.2.0-py3-none-any.whl en-core-web-trf @ https://github.com/explosion/spacy-models/releases/download/en_core_web_trf-3.2.0/en_core_web_trf-3.2.0-py3-none-any.whl et-xmlfile==1.1.0 fairscale==0.4.4 Faker==8.16.0 fasttext @ file:///Users/scharlesworth/fastText-0.9.2 filelock==3.0.12 flake8==4.0.1 flake8-bugbear==21.11.29 Flask==2.0.2 Flask-Cors==3.0.10 Flask-RESTful==0.3.9 frozenlist==1.2.0 fsspec==2021.11.1 future==0.18.2 gitdb==4.0.9 gitdb2==4.0.2 GitPython==3.1.24 google-api-core==1.26.2 google-api-python-client==2.0.2 google-auth==1.28.0 google-auth-httplib2==0.1.0 google-auth-oauthlib==0.4.6 googleapis-common-protos==1.53.0 grpcio==1.43.0 hdbscan==0.8.27 httplib2==0.19.0 huggingface-hub==0.2.1 humanfriendly==10.0 hydra-core==1.1.1 idna @ file:///tmp/build/80754af9/idna_1593446292537/work imagesize==1.3.0 importlib-metadata @ file:///tmp/build/80754af9/importlib-metadata_1602276842396/work importlib-resources==5.4.0 iniconfig==1.1.1 iopath==0.1.9 ipykernel @ file:///opt/concourse/worker/volumes/live/73e8766c-12c3-4f76-62a6-3dea9a7da5b7/volume/ipykernel_1596206701501/work/dist/ipykernel-5.3.4-py3-none-any.whl ipython @ file:///opt/concourse/worker/volumes/live/ac685347-76d6-4904-4b88-886c6a434f22/volume/ipython_1614616430264/work ipython-genutils @ file:///tmp/build/80754af9/ipython_genutils_1606773439826/work itsdangerous==2.0.1 jedi @ file:///opt/concourse/worker/volumes/live/5006b7b5-a924-4788-6cfe-ae05d8be8830/volume/jedi_1606932947370/work Jinja2==3.0.1 jmespath==0.10.0 joblib==1.0.1 jsonlines==3.0.0 jsonschema==3.0.2 jupyter-client @ file:///tmp/build/80754af9/jupyter_client_1601311786391/work jupyter-core @ file:///opt/concourse/worker/volumes/live/a699b83f-e941-4170-5136-bf87e3f37756/volume/jupyter_core_1612213304212/work keybert==0.5.0 kiwisolver==1.3.1 langcodes==3.3.0 llvmlite==0.36.0 loguru==0.5.3 Markdown==3.3.4 markdown-it-py==0.5.8 MarkupSafe==2.0.1 matplotlib==3.4.0 mccabe==0.6.1 mkl-fft==1.2.0 mkl-random==1.1.1 mkl-service==2.3.0 mock==4.0.3 multidict==5.2.0 multiprocess==0.70.12.2 murmurhash @ file:///opt/concourse/worker/volumes/live/9a0582f9-9097-4dab-6d7a-fcf62b4968ae/volume/murmurhash_1607456116622/work myst-parser==0.12.10 nltk==3.6.5 numba==0.53.1 numpy==1.20.2 oauthlib==3.1.1 omegaconf==2.1.1 openai==0.6.3 openpyxl==3.0.9 packaging==20.9 pandas==1.2.1 parlai==1.5.1 parquet==1.3.1 parso==0.7.0 pathy==0.6.1 pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work pickleshare @ file:///tmp/build/80754af9/pickleshare_1606932040724/work Pillow==8.2.0 plac @ file:///opt/concourse/worker/volumes/live/a94b6881-2d18-4055-5a3c-f24036f05ef6/volume/plac_1594259982880/work pluggy==1.0.0 ply==3.11 portalocker==2.3.2 praw==7.1.0 prawcore==1.5.0 preshed @ file:///opt/concourse/worker/volumes/live/952fa955-acc7-4aa0-6766-86f802ea8ef1/volume/preshed_1608233410312/work prompt-toolkit @ file:///tmp/build/80754af9/prompt-toolkit_1616415428029/work protobuf==3.15.6 ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl py==1.11.0 py-gfm==1.0.2 py-rouge==1.1 py4j==0.10.7 pyarrow==6.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybind11==2.6.1 pycodestyle==2.8.0 pycparser @ file:///tmp/build/80754af9/pycparser_1594388511720/work pydantic==1.8.2 pyee==8.2.2 pyflakes==2.4.0 Pygments @ file:///tmp/build/80754af9/pygments_1615143339740/work PyJWT==2.3.0 pynndescent==0.5.2 pyodbc==4.0.32 pyOpenSSL @ file:///tmp/build/80754af9/pyopenssl_1608057966937/work pyparsing==2.4.7 pyrsistent @ file:///opt/concourse/worker/volumes/live/656e0c1b-ef87-4251-4a51-1290b2351993/volume/pyrsistent_1600141745371/work PySocks @ file:///opt/concourse/worker/volumes/live/ef943889-94fc-4539-798d-461c60b77804/volume/pysocks_1605305801690/work pytest==6.2.5 pytest-datadir==1.3.1 pytest-regressions==2.2.0 python-dateutil @ file:///home/ktietz/src/ci/python-dateutil_1611928101742/work python-slugify==5.0.2 pytorch-transformers==1.2.0 pytz==2020.5 PyYAML==6.0 pyzmq==20.0.0 regex==2021.11.10 requests @ file:///tmp/build/80754af9/requests_1608241421344/work requests-mock==1.9.3 requests-oauthlib==1.3.0 requests-toolbelt==0.9.1 rich==10.16.2 rsa==4.7.2 s3transfer==0.4.2 sacremoses==0.0.44 scikit-learn==0.24.1 scipy==1.6.2 seaborn==0.11.1 sentence-transformers==1.0.4 sentencepiece==0.1.91 seqeval==0.0.5 sh==1.14.2 six @ file:///opt/concourse/worker/volumes/live/f983ba11-c9fe-4dff-7ce7-d89b95b09771/volume/six_1605205318156/work sklearn==0.0 slack-bolt==1.11.1 slack-sdk==3.13.0 slackclient==2.9.3 slackeventsapi==3.0.1 smart-open==5.2.1 smmap==5.0.0 snowballstemmer==2.2.0 spacy==3.2.0 spacy-alignments==0.8.4 spacy-legacy==3.0.8 spacy-loggers==1.0.1 spacy-sentence-bert==0.1.2 spacy-transformers==1.1.2 spark-nlp==3.0.2 Sphinx==2.2.2 sphinx-autodoc-typehints==1.10.3 sphinx-rtd-theme==1.0.0 sphinxcontrib-applehelp==1.0.2 sphinxcontrib-devhelp==1.0.2 sphinxcontrib-htmlhelp==2.0.0 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.3 sphinxcontrib-serializinghtml==1.1.5 srsly==2.4.2 subword-nmt==0.3.8 tensorboard==2.7.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.0 tensorboardX==2.4.1 text-unidecode==1.3 thinc==8.0.13 threadpoolctl==2.1.0 thriftpy2==0.4.14 tokenizers==0.10.2 toml==0.10.2 torch==1.10.1 torchtext==0.11.1 tornado @ file:///opt/concourse/worker/volumes/live/d531d395-893c-4ca1-6a5f-717b318eb08c/volume/tornado_1606942307627/work tqdm==4.62.3 traitlets @ file:///home/ktietz/src/ci/traitlets_1611929699868/work transformers==4.11.0 typer==0.4.0 typing-extensions==3.7.4.3 umap-learn==0.5.1 Unidecode==1.3.2 untokenize==0.1.1 update-checker==0.18.0 uritemplate==3.0.1 urllib3==1.26.7 wasabi==0.8.2 wcwidth @ file:///tmp/build/80754af9/wcwidth_1593447189090/work webexteamsbot==0.1.4.2 webexteamssdk==1.6 websocket-client==0.57.0 websocket-server==0.6.4 Werkzeug==2.0.1 xlrd==2.0.1 xxhash==2.0.2 yarl==1.7.2 zeroshot-topics==0.1.0 zipp @ file:///tmp/build/80754af9/zipp_1604001098328/work

    opened by sdcharle 1
  • Add size to lru_cache

    Add size to lru_cache

    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/__init__.py in <module>()
          1 __version__ = '0.1.0'
          2 
    ----> 3 from .zeroshot_tm import ZeroShotTopicFinder
    
    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/zeroshot_tm.py in <module>()
          1 import attr
          2 from keybert import KeyBERT
    ----> 3 from .utils import load_zeroshot_model
          4 from nltk.corpus import wordnet as wn
          5 
    
    /usr/local/lib/python3.7/dist-packages/zeroshot_topics/utils.py in <module>()
          4 
          5 @lru_cache
    ----> 6 def load_zeroshot_model(model_name="valhalla/distilbart-mnli-12-6"):
          7     classifier = pipeline("zero-shot-classification", model=model_name)
          8     return classifier
    
    /usr/lib/python3.7/functools.py in lru_cache(maxsize, typed)
        488             maxsize = 0
        489     elif maxsize is not None:
    --> 490         raise TypeError('Expected maxsize to be an integer or None')
        491 
        492     def decorating_function(user_function):
    
    TypeError: Expected maxsize to be an integer or None
    

    I assume that you have to provide, maxsize parameter to lru_cache. Worked for me, when I provided the parameter.

    opened by gsasikiran 6
Releases(v.0.0.1)
Owner
Rita Anjana
ML engineer
Rita Anjana
Pipeline for training LSA models using Scikit-Learn.

Latent Semantic Analysis Pipeline for training LSA models using Scikit-Learn. Usage Instead of writing custom code for latent semantic analysis, you j

Dani El-Ayyass 23 Sep 05, 2022
Yet Another Compiler Visualizer

yacv: Yet Another Compiler Visualizer yacv is a tool for visualizing various aspects of typical LL(1) and LR parsers. Check out demo on YouTube to see

Ashutosh Sathe 129 Dec 17, 2022
PyTorch Language Model for 1-Billion Word (LM1B / GBW) Dataset

PyTorch Large-Scale Language Model A Large-Scale PyTorch Language Model trained on the 1-Billion Word (LM1B) / (GBW) dataset Latest Results 39.98 Perp

Ryan Spring 114 Nov 04, 2022
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Hiroki Nakayama 1.5k Dec 05, 2022
Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

NLP-Summarizer Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5 This project aimed to provide in

Samuel Sharkey 1 Feb 07, 2022
Fast, general, and tested differentiable structured prediction in PyTorch

Torch-Struct: Structured Prediction Library A library of tested, GPU implementations of core structured prediction algorithms for deep learning applic

HNLP 1.1k Dec 16, 2022
Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

TRICE: a task-agnostic transferring framework for multi-source sequence generation This is the source code of our work Transfer Learning for Sequence

THUNLP-MT 9 Jun 27, 2022
Telegram bot to auto post messages of one channel in another channel as soon as it is posted, without the forwarded tag.

Channel Auto-Post Bot This bot can send all new messages from one channel, directly to another channel (or group, just in case), without the forwarded

Aditya 128 Dec 29, 2022
Summarization module based on KoBART

KoBART-summarization Install KoBART pip install git+https://github.com/SKT-AI/KoBART#egg=kobart Requirements pytorch==1.7.0 transformers==4.0.0 pytor

seujung hwan, Jung 148 Dec 28, 2022
ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset. Through its Python API, the pretrained model can be fine-tuned on any protein-related task in

241 Jan 04, 2023
A look-ahead multi-entity Transformer for modeling coordinated agents.

baller2vec++ This is the repository for the paper: Michael A. Alcorn and Anh Nguyen. baller2vec++: A Look-Ahead Multi-Entity Transformer For Modeling

Michael A. Alcorn 30 Dec 16, 2022
Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)

TOPSIS implementation in Python Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) CHING-LAI Hwang and Yoon introduced TOPSIS

Hamed Baziyad 8 Dec 10, 2022
IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models. Everything is pure Python and PyTorch based to keep it as simple and beginner-friendly, yet powerful as possible.

Digital Phonetics at the University of Stuttgart 247 Jan 05, 2023
CredData is a set of files including credentials in open source projects

CredData is a set of files including credentials in open source projects. CredData includes suspicious lines with manual review results and more information such as credential types for each suspicio

Samsung 19 Sep 07, 2022
English loanwords in the world's languages

Wiktionary as CLDF Content cldf1 and cldf2 contain cldf-conform data sets with a total of 2 377 756 entries about the vocabulary of all 1403 languages

Viktor Martinović 3 Jan 14, 2022
Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.

Seq2Seq Speech in JAX A JAX/Flax repository for combining a pre-trained speech encoder model (e.g. Wav2Vec2, HuBERT, WavLM) with a pre-trained text de

Sanchit Gandhi 21 Dec 14, 2022
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Keon Lee 67 Nov 14, 2022
Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

Breame ( British English and American English) Breame is a lightweight Python package with a number of utility tools to aid in the detection of words

Charles 8 Oct 10, 2022
Différents programmes créant une interface graphique a l'aide de Tkinter pour simplifier la vie des étudiants.

GP211-Grand-Projet Ce repertoire contient tout les programmes nécessaires au bon fonctionnement de notre projet-logiciel. Cette interface graphique es

1 Dec 21, 2021
HAN2HAN : Hangul Font Generation

HAN2HAN : Hangul Font Generation

Changwoo Lee 36 Dec 28, 2022