Package for controllable summarization

Overview

summarizers

PyPI version GitHub

  • summarizers is package for controllable summarization based CTRLsum.
  • currently, we only supports English. It doesn't work in other languages.

Installation

pip install summarizers

Usage

1. Create Summarizers

  • First at all, create summarizers obejct to summarize your own article.
>>> from summarizers import Summarizers
>>> summ = Summarizers()
  • You can select type of source article between [normal, paper, patent].
  • If you don't input any parameter, default type is normal.
>>> from summarizers import Summarizers
>>> summ = Summarizers('normal')  # <-- default.
>>> summ = Summarizers('paper')
>>> summ = Summarizers('patent')
  • If you want GPU acceleration, set param device='cuda'.
>>> from summarizers import Summarizers
>>> summ = Summarizers('normal', device='cuda')

2. Basic Summarization

  • If you inputted source article, basic summariztion is conducted.
>>> contents = """
Tunip is the Octonauts' head cook and gardener. 
He is a Vegimal, a half-animal, half-vegetable creature capable of breathing on land as well as underwater. 
Tunip is very childish and innocent, always wanting to help the Octonauts in any way he can. 
He is the smallest main character in the Octonauts crew.
"""
>>> summ(contents)
'Tunip is a Vegimal, a half-animal, half-vegetable creature'

3. Query focused Summarization

  • If you want to input query together, Query focused summarization conducted.
>>> summ(contents, query="main character of Octonauts")
'Tunip is the smallest main character in the Octonauts crew.'

3. Abstractive QA (Auto Question Detection)

  • If you inputted question as query, Abstractive QA is conducted.
>>> summ(contents, query="What is Vegimal?")
'Half-animal, half-vegetable'
  • You can turn off this feature by setting param question_detection=False.
>>> summ(contents, query="SOME_QUERY", question_detection=False)

4. Prompt based Summarization

  • You can generate summary that begins with some sequence using param prompt.
  • It works like GPT-3's Prompt based generation. (but It doesn't work very well.)
>>> summ(contents, prompt="Q:Who is Tunip? A:")
"Q:Who is Tunip? A: Tunip is the Octonauts' head"

5. Query focused Summarization with Prompt

  • You can also input both query and prompt.
  • In this case, a query focus summary is generated that starts with a prompt.
>>> summ(contents, query="personality of Tunip", prompt="Tunip is very")
"Tunip is very childish and innocent, always wanting to help the Octonauts."

6. Options for Decoding Strategy

  • For generative models, decoding strategy is very important.
  • summarizers support variety of options for decoding strategy.
>>> summ(
...     contents=contents,
...     num_beams=10,
...     top_k=30,
...     top_p=0.85,
...     no_repeat_ngram_size=3,                  
... )

License

Copyright 2021 Hyunwoong Ko.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Owner
Hyunwoong Ko
Research Engineer at @tunib-ai. previously @kakaobrain.
Hyunwoong Ko
🏖 Easy training and deployment of seq2seq models.

Headliner Headliner is a sequence modeling library that eases the training and in particular, the deployment of custom sequence models for both resear

Axel Springer Ideas Engineering GmbH 231 Nov 18, 2022
Contact Extraction with Question Answering.

contactsQA Extraction of contact entities from address blocks and imprints with Extractive Question Answering. Goal Input: Dr. Max Mustermann Hauptstr

Jan 2 Apr 20, 2022
Yet another Python binding for fastText

pyfasttext Warning! pyfasttext is no longer maintained: use the official Python binding from the fastText repository: https://github.com/facebookresea

Vincent Rasneur 230 Nov 16, 2022
Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

IMDB Sentiment Analysis This is the final project of Machine Learning Courses in Huazhong University of Science and Technology, School of Artificial I

Daniel 0 Dec 27, 2021
Fastseq 基于ONNXRUNTIME的文本生成加速框架

Fastseq 基于ONNXRUNTIME的文本生成加速框架

Jun Gao 9 Nov 09, 2021
Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

Lars Mescheder 884 Nov 11, 2022
GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

GCRC GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Eva

Yunxiao Zhao 5 Nov 04, 2022
Tools for curating biomedical training data for large-scale language modeling

Tools for curating biomedical training data for large-scale language modeling

BigScience Workshop 242 Dec 25, 2022
Black for Python docstrings and reStructuredText (rst).

Style-Doc Style-Doc is Black for Python docstrings and reStructuredText (rst). It can be used to format docstrings (Google docstring format) in Python

Telekom Open Source Software 13 Oct 24, 2022
A framework for cleaning Chinese dialog data

A framework for cleaning Chinese dialog data

Yida 136 Dec 20, 2022
EasyTransfer is designed to make the development of transfer learning in NLP applications easier.

EasyTransfer is designed to make the development of transfer learning in NLP applications easier. The literature has witnessed the success of applying

Alibaba 819 Jan 03, 2023
Skipgram Negative Sampling in PyTorch

PyTorch SGNS Word2Vec's SkipGramNegativeSampling in Python. Yet another but quite general negative sampling loss implemented in PyTorch. It can be use

Jamie J. Seol 287 Dec 14, 2022
Research code for the paper "Fine-tuning wav2vec2 for speaker recognition"

Fine-tuning wav2vec2 for speaker recognition This is the code used to run the experiments in https://arxiv.org/abs/2109.15053. Detailed logs of each t

Nik 103 Dec 26, 2022
Simple text to phones converter for multiple languages

Phonemizer -- foʊnmaɪzɚ The phonemizer allows simple phonemization of words and texts in many languages. Provides both the phonemize command-line tool

CoML 762 Dec 29, 2022
TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Alexa 98 Dec 09, 2022
Asr abc - Automatic speech recognition(ASR),中文语音识别

语音识别的简单示例,主要在课堂演示使用 创建python虚拟环境 在linux 和macos 上验证通过 # 如果已经有pyhon3.6 环境,跳过该步骤,使用

LIyong.Guo 8 Nov 11, 2022
Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers. Cherche is meant to be used with small to medium sized corpora. C

Raphael Sourty 224 Nov 29, 2022
Pipelines de datos, 2021.

Este repo ilustra un proceso sencillo de automatización de transformación y modelado de datos, a través de un pipeline utilizando Luigi. Stack princip

Rodolfo Ferro 8 May 19, 2022
A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

NEC Laboratories Europe 13 Sep 08, 2022
This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

This project is part of Eleuther AI's quest to create a massive repository of high quality text data for training language models.

EleutherAI 42 Dec 13, 2022