Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Overview

keytotext

pypi Version Downloads Open In Colab Streamlit App API Call Docker Call HuggingFace Documentation Status Code style: black CodeFactor

keytotext

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Potential use case can include:

  • Marketing
  • Search Engine Optimization
  • Topic generation etc.
  • Fine tuning of topic modeling models

Model:

Keytotext is based on the Amazing T5 Model: HuggingFace

  • k2t: Model
  • k2t-base: Model
  • mrm8488/t5-base-finetuned-common_gen (by Manuel Romero): Model

Training Notebooks can be found in the Training Notebooks Folder

Note: To add your own model to keytotext Please read Models Documentation

Usage:

Example usage: Open In Colab

Example Notebooks can be found in the Notebooks Folder

pip install keytotext

carbon (3)

Trainer:

Keytotext now has a trainer class than be used to train and finetune any T5 based model on new data. Updated Trainer docs here: Docs

Trainer example here: Open In Colab

from keytotext import trainer

carbon (6)

UI:

UI: Streamlit App

pip install streamlit-tags

This uses a custom streamlit component built by me: GitHub

image

API:

API: API Call Docker Call

The API is hosted in the Docker container and it can be run quickly. Follow instructions below to get started

docker pull gagan30/keytotext

docker run -dp 8000:8000 gagan30/keytotext

This will start the api at port 8000 visit the url below to get the results as below:

http://localhost:8000/api?data=["India","Capital","New Delhi"]

k2t_json

Note: The Hosted API is only available on demand

BibTex:

To quote keytotext please use this citation

@misc{bhatia, 
      title={keytotext},
      url={https://github.com/gagan3012/keytotext}, 
      journal={GitHub}, 
      author={Bhatia, Gagan}
}

References

Articles about keytotext:

Comments
  • ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)

    ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)

    Hi,

    I tried to install keytotext via pip install keytotext --upgrade in local machine.

    but came across the following :

    ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)
    ERROR: No matching distribution found for keytotext
    

    My pip version is the latest. However, the above works just fine in colab. Please guide me through the fix?

    opened by abhijithneilabraham 6
  • Add finetuning model to keytotext

    Add finetuning model to keytotext

    Is your feature request related to a problem? Please describe. Its difficult to use it without fine-tuning on new corpus so we need to build script to finetune it on new corpus

    enhancement good first issue 
    opened by gagan3012 2
  • "Oh no." ?

    "Error running app. If this keeps happening, please file an issue."

    Ok,...sure? I know nothing about this app.

    Just saw your tweet, clicked the link to this repo, then clicked the link on the side. Got that message. Now what?

    Chrome browser, Linux.

    opened by drscotthawley 2
  • Add Citations

    Add Citations

    Is your feature request related to a problem? Please describe. Inspirations: https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    opened by gagan3012 1
  • Adding new models to keytotext

    Adding new models to keytotext

    Is your feature request related to a problem? Please describe. Adding new models to keytotext: https://huggingface.co/mrm8488/t5-base-finetuned-common_gen

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    enhancement good first issue 
    opened by gagan3012 1
  • Inference API for Keytotext

    Inference API for Keytotext

    Is your feature request related to a problem? Please describe. It is difficult to host the UI on streamlit without API

    Describe the solution you'd like Inference API

    enhancement good first issue 
    opened by gagan3012 1
  • Create Better UI

    Create Better UI

    Is your feature request related to a problem? Please describe. The current UI is not functional It needs to be fixed

    Describe the solution you'd like Better UI with a nicer design

    enhancement 
    opened by gagan3012 1
  • Add `st.cache` to load model

    Add `st.cache` to load model

    Hi @gagan3012,

    Johannes from the Streamlit team here :) I am currently investigating why apps run over the resource limits of Streamlit Sharing and saw that your app was affected in the past few days.

    Thought I'd send you a small PR which should fix this. You've already been on a good way with using st.cache but it gets even better if you use it once more to load the model. This makes sure the model and tokenizer are only loaded once, which should make the app consume less memory (and not run into resource limits again! Plus, I've seen that it also works a bit faster now ;).

    Hope this works for you and let me know if you have any other questions! 🎈

    Cheers, Johannes

    opened by jrieke 1
  • ValueError: transformers.models.auto.__spec__ is None

    ValueError: transformers.models.auto.__spec__ is None

    'from keytotext import pipeline'

    While running the above line, it is showing this error . "ValueError: transformers.models.auto.spec is None"

    opened by varunakk 0
  • Update README.md

    Update README.md

    Description

    Motivation and Context

    How Has This Been Tested?

    Screenshots (if appropriate):

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [ ] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    opened by gagan3012 0
  • Update trainer.py

    Update trainer.py

    Description

    Motivation and Context

    How Has This Been Tested?

    Screenshots (if appropriate):

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [ ] My code follows the code style of this project.
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    opened by gagan3012 0
  • Pipeline error on fresh install

    Pipeline error on fresh install

    Hi I'm getting this on a first run and fresh install

    Global seed set to 42 Traceback (most recent call last): File "C:\Users\skint\PycharmProjects\spacynd2\testdata.py", line 1, in <module> from keytotext import pipeline File "C:\Users\skint\venv\lib\site-packages\keytotext\__init__.py", line 11, in <module> from .dataset import make_dataset File "C:\Users\skint\venv\lib\site-packages\keytotext\dataset.py", line 1, in <module> from cv2 import randShuffle ModuleNotFoundError: No module named 'cv2'

    opened by skintflickz 0
  • New TypeError: __init__() got an unexpected keyword argument 'progress_bar_refresh_rate'

    New TypeError: __init__() got an unexpected keyword argument 'progress_bar_refresh_rate'

    I have imported the model and necessary libraries. I am getting the below error in google colab. I have used this model earlier also few months back and it was working fine. This is the new issue I am facing recently with the same code.


    TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

    Imported libraries:

    !pip install keytotext --upgrade !sudo apt-get install git-lfs

    from keytotext import trainer

    Training Model:

    model = trainer() model.from_pretrained(model_name="t5-small") model.train(train_df=df_train_final, test_df=df_test, batch_size=3, max_epochs=5,use_gpu=True) model.save_model()

    Have attached error screenshot

    • OS: Windows
    • Browser Chrome Error
    opened by aishwaryapisal9 2
  • Update trainer.py

    Update trainer.py

    Delete progress_bar_refresh_rate in trainer.py

    Description

    delete progress_bar_refresh_rate=5, since this keyword argument is no longer supported by the latest version (1.7.0) of PyTorch.Lightning.Trainer module

    Motivation and Context

    having this argument fails the training process

    How Has This Been Tested?

    Ran key to text on the custom dataset before and after August 2nd, 2022. Changes in the new version of Pytorch Lightning's Trainer were put into effect on that date where the above argument was removed and hence, the custom training failed since that day.

    Screenshots (if appropriate):

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)

    Checklist:

    • [x] My code follows the code style of this project.
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have read the CONTRIBUTING document.
    opened by anath2110benten 0
  • Why is cv2 required?

    Why is cv2 required?

    https://github.com/gagan3012/keytotext/blob/6f807b940f5e2fdeb755ed085b40af7c0fa5e87e/keytotext/dataset.py#L1

    I'm using this framework to generate text from knowlege graph. Python interpreter keeps throwing "cv2 not installed" exception. Looks like the pip package doesn't contains cv2 as dependancy. I tried to delete this line in source code, the model works well. Is this line necessary for this project? Concerning about adding opencv to pip package? Thanks for your concern.

    opened by ChunxuYang 0
  • Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

    Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

    Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

    Describe the solution you'd like A clear and concise description of what you want to happen.

    Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

    Additional context Add any other context or screenshots about the feature request here.

    opened by RuiFeiHe 6
Releases(v1.5.0)
Owner
Gagan Bhatia
Software Developer | Machine Learning Enthusiast
Gagan Bhatia
CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

fastNLP 342 Jan 05, 2023
pyMorfologik MorfologikpyMorfologik - Python binding for Morfologik.

Python binding for Morfologik Morfologik is Polish morphological analyzer. For more information see http://github.com/morfologik/morfologik-stemming/

Damian Mirecki 18 Dec 29, 2021
LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation Tasks | Datasets | LongLM | Baselines | Paper Introduction LOT is a ben

46 Dec 28, 2022
ConvBERT: Improving BERT with Span-based Dynamic Convolution

ConvBERT Introduction In this repo, we introduce a new architecture ConvBERT for pre-training based language model. The code is tested on a V100 GPU.

YITUTech 237 Dec 10, 2022
This code is the implementation of Text Emotion Recognition (TER) with linguistic features

APSIPA-TER This code is the implementation of Text Emotion Recognition (TER) with linguistic features. The network model is BERT with a pretrained mod

kenro515 1 Feb 08, 2022
Unsupervised Language Model Pre-training for French

FlauBERT and FLUE FlauBERT is a French BERT trained on a very large and heterogeneous French corpus. Models of different sizes are trained using the n

GETALP 212 Dec 10, 2022
MiCECo - Misskey Custom Emoji Counter

MiCECo Misskey Custom Emoji Counter Introduction This little script counts custo

7 Dec 25, 2022
PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Feature_CRF_AE Feature_CRF_AE provides a implementation of Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Jacob Zhou 6 Apr 29, 2022
Retraining OpenAI's GPT-2 on Discord Chats

Train OpenAI's GPT-2 on Discord Chats Retraining a Text Generation Model on Discord Chats using gpt-2-simple that wraps existing model fine-tuning and

Ayush Mishra 4 Oct 27, 2022
Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2. It is trained (finetuned) on a curated list of approximately 45K Python (~470MB) files gathered from the

Galois Autocompleter 91 Sep 23, 2022
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Hiring We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-traine

Microsoft 7.8k Jan 09, 2023
Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System Authors: Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai

Amazon Web Services - Labs 124 Jan 03, 2023
Bu Chatbot, Konya Bilim Merkezi Yen için tasarlanmış olan bir projedir.

chatbot Bu Chatbot, Konya Bilim Merkezi Yeni Ufuklar Sergisi için 2021 Yılında tasarlanmış olan bir projedir. Chatbot Python ortamında yazılmıştır. Sö

Emre Özkul 1 Feb 23, 2022
A collection of GNN-based fake news detection models.

This repo includes the Pytorch-Geometric implementation of a series of Graph Neural Network (GNN) based fake news detection models. All GNN models are implemented and evaluated under the User Prefere

SafeGraph 251 Jan 01, 2023
Weakly-supervised Text Classification Based on Keyword Graph

Weakly-supervised Text Classification Based on Keyword Graph How to run? Download data Our dataset follows previous works. For long texts, we follow C

Hello_World 20 Dec 29, 2022
Natural Language Processing Best Practices & Examples

NLP Best Practices In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive bus

Microsoft 6.1k Dec 31, 2022
Behavioral Testing of Clinical NLP Models

Behavioral Testing of Clinical NLP Models This repository contains code for testing the behavior of clinical prediction models based on patient letter

Betty van Aken 2 Sep 20, 2022
NLP-based analysis of poor Chinese movie reviews on Douban

douban_embedding 豆瓣中文影评差评分析 1. NLP NLP(Natural Language Processing)是指自然语言处理,他的目的是让计算机可以听懂人话。 下面是我将2万条豆瓣影评训练之后,随意输入一段新影评交给神经网络,最终AI推断出的结果。 "很好,演技不错

3 Apr 15, 2022
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

Tsukinousag1 3 Apr 02, 2022
BeautyNet is an AI powered model which can tell you whether you're beautiful or not.

BeautyNet BeautyNet is an AI powered model which can tell you whether you're beautiful or not. Download Dataset from here:https://www.kaggle.com/gpios

Ansh Gupta 0 May 06, 2022