Package for controllable summarization

Overview

summarizers

PyPI version GitHub

  • summarizers is package for controllable summarization based CTRLsum.
  • currently, we only supports English. It doesn't work in other languages.

Installation

pip install summarizers

Usage

1. Create Summarizers

  • First at all, create summarizers obejct to summarize your own article.
>>> from summarizers import Summarizers
>>> summ = Summarizers()
  • You can select type of source article between [normal, paper, patent].
  • If you don't input any parameter, default type is normal.
>>> from summarizers import Summarizers
>>> summ = Summarizers('normal')  # <-- default.
>>> summ = Summarizers('paper')
>>> summ = Summarizers('patent')
  • If you want GPU acceleration, set param device='cuda'.
>>> from summarizers import Summarizers
>>> summ = Summarizers('normal', device='cuda')

2. Basic Summarization

  • If you inputted source article, basic summariztion is conducted.
>>> contents = """
Tunip is the Octonauts' head cook and gardener. 
He is a Vegimal, a half-animal, half-vegetable creature capable of breathing on land as well as underwater. 
Tunip is very childish and innocent, always wanting to help the Octonauts in any way he can. 
He is the smallest main character in the Octonauts crew.
"""
>>> summ(contents)
'Tunip is a Vegimal, a half-animal, half-vegetable creature'

3. Query focused Summarization

  • If you want to input query together, Query focused summarization conducted.
>>> summ(contents, query="main character of Octonauts")
'Tunip is the smallest main character in the Octonauts crew.'

3. Abstractive QA (Auto Question Detection)

  • If you inputted question as query, Abstractive QA is conducted.
>>> summ(contents, query="What is Vegimal?")
'Half-animal, half-vegetable'
  • You can turn off this feature by setting param question_detection=False.
>>> summ(contents, query="SOME_QUERY", question_detection=False)

4. Prompt based Summarization

  • You can generate summary that begins with some sequence using param prompt.
  • It works like GPT-3's Prompt based generation. (but It doesn't work very well.)
>>> summ(contents, prompt="Q:Who is Tunip? A:")
"Q:Who is Tunip? A: Tunip is the Octonauts' head"

5. Query focused Summarization with Prompt

  • You can also input both query and prompt.
  • In this case, a query focus summary is generated that starts with a prompt.
>>> summ(contents, query="personality of Tunip", prompt="Tunip is very")
"Tunip is very childish and innocent, always wanting to help the Octonauts."

6. Options for Decoding Strategy

  • For generative models, decoding strategy is very important.
  • summarizers support variety of options for decoding strategy.
>>> summ(
...     contents=contents,
...     num_beams=10,
...     top_k=30,
...     top_p=0.85,
...     no_repeat_ngram_size=3,                  
... )

License

Copyright 2021 Hyunwoong Ko.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Owner
Hyunwoong Ko
Research Engineer at @tunib-ai. previously @kakaobrain.
Hyunwoong Ko
MEDIALpy: MEDIcal Abbreviations Lookup in Python

A small python package that allows the user to look up common medical abbreviations.

Aberystwyth Systems Biology 7 Nov 09, 2022
Python functions for summarizing and improving voice dictation input.

Helpmespeak Help me speak uses Python functions for summarizing and improving voice dictation input. Get started with OpenAI gpt-3 OpenAI is a amazing

Margarita Humanitarian Foundation 6 Dec 17, 2022
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

20.5k Jan 08, 2023
Vad-sli-asr - A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

VAD-SLI-ASR Python scripts for a speech processing pipeline with Voice Activity

Dynamics of Language 14 Dec 09, 2022
Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields [project page][paper][cite] Geometry-Consistent Neural Shape Represe

Yifan Wang 100 Dec 19, 2022
Autoregressive Entity Retrieval

The GENRE (Generative ENtity REtrieval) system as presented in Autoregressive Entity Retrieval implemented in pytorch. @inproceedings{decao2020autoreg

Meta Research 611 Dec 16, 2022
MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data.

MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data. It is implemented using Python.

willow 6 Jun 27, 2022
Задания КЕГЭ по информатике 2021 на Python

КЕГЭ 2021 на Python В этом репозитории мои решения типовых заданий КЕГЭ по информатике в 2021 году, БЕСПЛАТНО! Задания Взяты с https://inf-ege.sdamgia

8 Oct 13, 2022
Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles

Which Apple Keeps Which Doctor Away? Colorful Word Representations with Visual Oracles (TASLP 2022)

Zhuosheng Zhang 3 Apr 14, 2022
中文医疗信息处理基准CBLUE: A Chinese Biomedical LanguageUnderstanding Evaluation Benchmark

English | 中文说明 CBLUE AI (Artificial Intelligence) is playing an indispensabe role in the biomedical field, helping improve medical technology. For fur

452 Dec 30, 2022
Abhijith Neil Abraham 2 Nov 05, 2021
An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

Extension - matrix and vocabulary extractor for TF-IDF and Doc2Vec An extension for ASReview that adds a tf-idf extractor that saves the matrix and th

ASReview 4 Jun 17, 2022
Source code of the "Graph-Bert: Only Attention is Needed for Learning Graph Representations" paper

Graph-Bert Source code of "Graph-Bert: Only Attention is Needed for Learning Graph Representations". Please check the script.py as the entry point. We

14 Mar 25, 2022
All the code I wrote for Overwatch-related projects that I still own the rights to.

overwatch_shit.zip This is (eventually) going to contain all the software I wrote during my five-year imprisonment stay playing Overwatch. I'll be add

zkxjzmswkwl 2 Dec 31, 2021
Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products.

Leah Pathan Khan 2 Jan 12, 2022
Linear programming solver for paper-reviewer matching and mind-matching

Paper-Reviewer Matcher A python package for paper-reviewer matching algorithm based on topic modeling and linear programming. The algorithm is impleme

Titipat Achakulvisut 66 Jul 05, 2022
PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

data2vec-pytorch PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI (F

Aryan Shekarlaban 105 Jan 04, 2023
Code for lyric-section-to-comment generation based on huggingface transformers.

CommentGeneration Code for lyric-section-to-comment generation based on huggingface transformers. Migrate Guyu model and code (both 12-layers and 24-l

Yawei Sun 8 Sep 04, 2021
Trains an OpenNMT PyTorch model and SentencePiece tokenizer.

Trains an OpenNMT PyTorch model and SentencePiece tokenizer. Designed for use with Argos Translate and LibreTranslate.

Argos Open Tech 61 Dec 13, 2022
NLP topic mdel LDA - Gathered from New York Times website

NLP topic mdel LDA - Gathered from New York Times website

1 Oct 14, 2021