Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Last update: Jan 03, 2023

Overview

keytotext

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Potential use case can include:

Marketing
Search Engine Optimization
Topic generation etc.
Fine tuning of topic modeling models

Model:

Keytotext is based on the Amazing T5 Model:

k2t: Model
k2t-base: Model
mrm8488/t5-base-finetuned-common_gen (by Manuel Romero): Model

Training Notebooks can be found in the Training Notebooks Folder

Note: To add your own model to keytotext Please read Models Documentation

Usage:

Example usage:

Example Notebooks can be found in the Notebooks Folder

pip install keytotext

Trainer:

Keytotext now has a trainer class than be used to train and finetune any T5 based model on new data. Updated Trainer docs here: Docs

Trainer example here:

from keytotext import trainer

UI:

pip install streamlit-tags

This uses a custom streamlit component built by me: GitHub

API:

The API is hosted in the Docker container and it can be run quickly. Follow instructions below to get started

docker pull gagan30/keytotext

docker run -dp 8000:8000 gagan30/keytotext

This will start the api at port 8000 visit the url below to get the results as below:

http://localhost:8000/api?data=["India","Capital","New Delhi"]

Note: The Hosted API is only available on demand

BibTex:

To quote keytotext please use this citation

@misc{bhatia, 
      title={keytotext},
      url={https://github.com/gagan3012/keytotext}, 
      journal={GitHub}, 
      author={Bhatia, Gagan}
}

References

https://github.com/Shivanandroy/simpleT5 (Shivanand Roy)
https://github.com/patil-suraj/question_generation (Suraj Patil)
https://github.com/MathewAlexander/T5_nlg (Mathew Alexander)

Articles about keytotext:

https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45 (Mathew Alexander)
Amazing Video by 1LittleCoder here: https://www.youtube.com/watch?v=I0iBzP-SxFY about keytotext
https://medium.com/mlearning-ai/generating-sentences-from-keywords-using-transformers-in-nlp-e89f4de5cf6b (Prakhar Mishra)

Comments

ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none)
Hi,

I tried to install keytotext via pip install keytotext --upgrade in local machine.

but came across the following :

ERROR: Could not find a version that satisfies the requirement keytotext (from versions: none) ERROR: No matching distribution found for keytotext

My pip version is the latest. However, the above works just fine in colab. Please guide me through the fix?
opened by abhijithneilabraham 6
Add finetuning model to keytotext

Is your feature request related to a problem? Please describe. Its difficult to use it without fine-tuning on new corpus so we need to build script to finetune it on new corpus
enhancement good first issue

opened by gagan3012 2
"Oh no." ?

"Error running app. If this keeps happening, please file an issue."

Ok,...sure? I know nothing about this app.

Just saw your tweet, clicked the link to this repo, then clicked the link on the side. Got that message. Now what?

Chrome browser, Linux.

opened by drscotthawley 2
Add Citations

Is your feature request related to a problem? Please describe. Inspirations: https://towardsdatascience.com/data-to-text-generation-with-t5-building-a-simple-yet-advanced-nlg-model-b5cce5a6df45

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

opened by gagan3012 1
Adding new models to keytotext

Is your feature request related to a problem? Please describe. Adding new models to keytotext: https://huggingface.co/mrm8488/t5-base-finetuned-common_gen

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.
enhancement good first issue

opened by gagan3012 1
Inference API for Keytotext

Is your feature request related to a problem? Please describe. It is difficult to host the UI on streamlit without API

Describe the solution you'd like Inference API
enhancement good first issue

opened by gagan3012 1
Create Better UI

Is your feature request related to a problem? Please describe. The current UI is not functional It needs to be fixed

Describe the solution you'd like Better UI with a nicer design
enhancement

opened by gagan3012 1
Add `st.cache` to load model

Hi @gagan3012,

Johannes from the Streamlit team here :) I am currently investigating why apps run over the resource limits of Streamlit Sharing and saw that your app was affected in the past few days.

Thought I'd send you a small PR which should fix this. You've already been on a good way with using st.cache but it gets even better if you use it once more to load the model. This makes sure the model and tokenizer are only loaded once, which should make the app consume less memory (and not run into resource limits again! Plus, I've seen that it also works a bit faster now ;).

Hope this works for you and let me know if you have any other questions! 🎈

Cheers, Johannes

opened by jrieke 1
ValueError: transformers.models.auto.__spec__ is None

'from keytotext import pipeline'

While running the above line, it is showing this error . "ValueError: transformers.models.auto.spec is None"

opened by varunakk 0
Update README.md
Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[ ] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by gagan3012 0
Update trainer.py
Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[ ] My code follows the code style of this project.

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by gagan3012 0
Pipeline error on fresh install

Hi I'm getting this on a first run and fresh install

Global seed set to 42 Traceback (most recent call last): File "C:\Users\skint\PycharmProjects\spacynd2\testdata.py", line 1, in <module> from keytotext import pipeline File "C:\Users\skint\venv\lib\site-packages\keytotext\__init__.py", line 11, in <module> from .dataset import make_dataset File "C:\Users\skint\venv\lib\site-packages\keytotext\dataset.py", line 1, in <module> from cv2 import randShuffle ModuleNotFoundError: No module named 'cv2'

opened by skintflickz 0
New TypeError: __init__() got an unexpected keyword argument 'progress_bar_refresh_rate'
I have imported the model and necessary libraries. I am getting the below error in google colab. I have used this model earlier also few months back and it was working fine. This is the new issue I am facing recently with the same code.

TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

Imported libraries:

!pip install keytotext --upgrade !sudo apt-get install git-lfs

from keytotext import trainer

Training Model:

model = trainer() model.from_pretrained(model_name="t5-small") model.train(train_df=df_train_final, test_df=df_test, batch_size=3, max_epochs=5,use_gpu=True) model.save_model()

Have attached error screenshot

OS: Windows

Browser Chrome
opened by aishwaryapisal9 2
Update trainer.py
Delete progress_bar_refresh_rate in trainer.py

Description

delete progress_bar_refresh_rate=5, since this keyword argument is no longer supported by the latest version (1.7.0) of PyTorch.Lightning.Trainer module

Motivation and Context

having this argument fails the training process

How Has This Been Tested?

Ran key to text on the custom dataset before and after August 2nd, 2022. Changes in the new version of Pytorch Lightning's Trainer were put into effect on that date where the above argument was removed and hence, the custom training failed since that day.

Screenshots (if appropriate):

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[x] My code follows the code style of this project.

[x] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have read the CONTRIBUTING document.
opened by anath2110benten 0
Why is cv2 required?

https://github.com/gagan3012/keytotext/blob/6f807b940f5e2fdeb755ed085b40af7c0fa5e87e/keytotext/dataset.py#L1

I'm using this framework to generate text from knowlege graph. Python interpreter keeps throwing "cv2 not installed" exception. Looks like the pip package doesn't contains cv2 as dependancy. I tried to delete this line in source code, the model works well. Is this line necessary for this project? Concerning about adding opencv to pip package? Thanks for your concern.

opened by ChunxuYang 0
Hi, I notice that given the same input keywords, across different runs, the generated text are the same, even setting different seeds by 'pl.seed_everything(..)'.

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

opened by RuiFeiHe 6

Releases(v1.5.0)

v1.5.0(Jul 9, 2021)

Trainer tool finalized and completed!
Source code(tar.gz)
Source code(zip)
v1.4.1(Jul 2, 2021)

Val acc added
Source code(tar.gz)
Source code(zip)
v1.3.9(Jul 2, 2021)

Bug fixes
Source code(tar.gz)
Source code(zip)
v1.3.8(Jul 2, 2021)

New Upload to hf hub module
Source code(tar.gz)
Source code(zip)
v1.3.1(Jun 16, 2021)

Documentation updated along with sematic versioning
Source code(tar.gz)
Source code(zip)

v0.3.1(Jun 15, 2021)

This version features a tested trainer which can be used in 4 lines of code:

from keytotext import KeytotextTrainer

model = KeytotextTrainer()
model.from_pretrained(model_name="t5-small")
model.train(data_df=df,batch_size=4, max_epochs=3, use_gpu=True)
model.save_model()

Source code(tar.gz)
Source code(zip)

v0.2.9(Jun 15, 2021)

This release features the new Trainer module More details coming soon
Source code(tar.gz)
Source code(zip)
v0.2.5(May 12, 2021)
Changes:

Bug Fixes

Maintaining new models

Source code(tar.gz)
Source code(zip)
v0.2.4(May 11, 2021)
Changes:

Refactoring of code

Ability to add new models too

Source code(tar.gz)
Source code(zip)
v0.2.3(May 10, 2021)
v0.2.3 :

Bug fixes

New models added

Source code(tar.gz)
Source code(zip)
v0.2.2(May 10, 2021)
Changes:

Now keytotext supports new models trained by other people too

A new fine-tuning script

Source code(tar.gz)
Source code(zip)
v0.2.1(May 5, 2021)

Bug fixes
Source code(tar.gz)
Source code(zip)
v0.2.0(May 4, 2021)
Latest Release:

Completed API

Completed testing

completed all Evals

UI Improvements too

Source code(tar.gz)
Source code(zip)
v0.1.6(May 2, 2021)
Changes:

Updates to Eval pipeline

Source code(tar.gz)
Source code(zip)
v0.1.5(May 2, 2021)
Changes:

Added Trainer API

Added Eval pipeline

Source code(tar.gz)
Source code(zip)
v0.1.4(Apr 30, 2021)

Latest release
Source code(tar.gz)
Source code(zip)
v0.1.3(Apr 27, 2021)

Updates
Source code(tar.gz)
Source code(zip)
0.1.1(Apr 26, 2021)

Source code(tar.gz)
Source code(zip)
0.1.0(Apr 26, 2021)

Production release- 0.1.0
Source code(tar.gz)
Source code(zip)

Owner

Gagan Bhatia

Software Developer | Machine Learning Enthusiast

GitHub Repository https://share.streamlit.io/gagan3012/keytotext/UI/app.py

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

342 Jan 05, 2023

pyMorfologik MorfologikpyMorfologik - Python binding for Morfologik.

Python binding for Morfologik Morfologik is Polish morphological analyzer. For more information see http://github.com/morfologik/morfologik-stemming/

18 Dec 29, 2021

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation Tasks | Datasets | LongLM | Baselines | Paper Introduction LOT is a ben

46 Dec 28, 2022

ConvBERT: Improving BERT with Span-based Dynamic Convolution

ConvBERT Introduction In this repo, we introduce a new architecture ConvBERT for pre-training based language model. The code is tested on a V100 GPU.

237 Dec 10, 2022

This code is the implementation of Text Emotion Recognition (TER) with linguistic features

APSIPA-TER This code is the implementation of Text Emotion Recognition (TER) with linguistic features. The network model is BERT with a pretrained mod

1 Feb 08, 2022

Unsupervised Language Model Pre-training for French

FlauBERT and FLUE FlauBERT is a French BERT trained on a very large and heterogeneous French corpus. Models of different sizes are trained using the n

212 Dec 10, 2022

MiCECo - Misskey Custom Emoji Counter

MiCECo Misskey Custom Emoji Counter Introduction This little script counts custo

7 Dec 25, 2022

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Feature_CRF_AE Feature_CRF_AE provides a implementation of Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

6 Apr 29, 2022

Retraining OpenAI's GPT-2 on Discord Chats

Train OpenAI's GPT-2 on Discord Chats Retraining a Text Generation Model on Discord Chats using gpt-2-simple that wraps existing model fine-tuning and

4 Oct 27, 2022

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2. It is trained (finetuned) on a curated list of approximately 45K Python (~470MB) files gathered from the

91 Sep 23, 2022

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Hiring We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on NLP and large-scale pre-traine

7.8k Jan 09, 2023

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System Authors: Yixuan Su, Lei Shu, Elman Mansimov, Arshit Gupta, Deng Cai, Yi-An Lai

124 Jan 03, 2023

Bu Chatbot, Konya Bilim Merkezi Yen için tasarlanmış olan bir projedir.

chatbot Bu Chatbot, Konya Bilim Merkezi Yeni Ufuklar Sergisi için 2021 Yılında tasarlanmış olan bir projedir. Chatbot Python ortamında yazılmıştır. Sö

1 Feb 23, 2022

A collection of GNN-based fake news detection models.

This repo includes the Pytorch-Geometric implementation of a series of Graph Neural Network (GNN) based fake news detection models. All GNN models are implemented and evaluated under the User Prefere

251 Jan 01, 2023

Weakly-supervised Text Classification Based on Keyword Graph

Weakly-supervised Text Classification Based on Keyword Graph How to run? Download data Our dataset follows previous works. For long texts, we follow C

20 Dec 29, 2022

Natural Language Processing Best Practices & Examples

NLP Best Practices In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive bus

6.1k Dec 31, 2022

Behavioral Testing of Clinical NLP Models

Behavioral Testing of Clinical NLP Models This repository contains code for testing the behavior of clinical prediction models based on patient letter

2 Sep 20, 2022

NLP-based analysis of poor Chinese movie reviews on Douban

douban_embedding 豆瓣中文影评差评分析 1. NLP NLP（Natural Language Processing）是指自然语言处理，他的目的是让计算机可以听懂人话。下面是我将2万条豆瓣影评训练之后，随意输入一段新影评交给神经网络，最终AI推断出的结果。 "很好，演技不错

3 Apr 15, 2022

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper：An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

3 Apr 02, 2022

BeautyNet is an AI powered model which can tell you whether you're beautiful or not.

BeautyNet BeautyNet is an AI powered model which can tell you whether you're beautiful or not. Download Dataset from here:https://www.kaggle.com/gpios

0 May 06, 2022

Idea is to build a model which will take keywords as inputs and generate sentences as outputs.

Related tags

Overview

keytotext

Model:

Usage:

Trainer:

UI:

API:

BibTex:

References

Articles about keytotext:

Comments

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

TypeError: init() got an unexpected keyword argument 'progress_bar_refresh_rate'

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

Releases(v1.5.0)

v1.5.0(Jul 9, 2021)

v1.4.1(Jul 2, 2021)

v1.3.9(Jul 2, 2021)

v1.3.8(Jul 2, 2021)

v1.3.1(Jun 16, 2021)

v0.3.1(Jun 15, 2021)

v0.2.9(Jun 15, 2021)

v0.2.5(May 12, 2021)

v0.2.4(May 11, 2021)

v0.2.3(May 10, 2021)

v0.2.2(May 10, 2021)

v0.2.1(May 5, 2021)

v0.2.0(May 4, 2021)

v0.1.6(May 2, 2021)

v0.1.5(May 2, 2021)

v0.1.4(Apr 30, 2021)

v0.1.3(Apr 27, 2021)

0.1.1(Apr 26, 2021)

0.1.0(Apr 26, 2021)

Owner

Gagan Bhatia

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

pyMorfologik MorfologikpyMorfologik - Python binding for Morfologik.

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

ConvBERT: Improving BERT with Span-based Dynamic Convolution

This code is the implementation of Text Emotion Recognition (TER) with linguistic features

Unsupervised Language Model Pre-training for French

MiCECo - Misskey Custom Emoji Counter

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Retraining OpenAI's GPT-2 on Discord Chats

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

Bu Chatbot, Konya Bilim Merkezi Yen için tasarlanmış olan bir projedir.

A collection of GNN-based fake news detection models.

Weakly-supervised Text Classification Based on Keyword Graph

Natural Language Processing Best Practices & Examples

Behavioral Testing of Clinical NLP Models

NLP-based analysis of poor Chinese movie reviews on Douban

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

BeautyNet is an AI powered model which can tell you whether you're beautiful or not.