Understand Text Summarization and create your own summarizer in python

Last update: Oct 18, 2022

Overview

Understand Text Summarization and create your own summarizer in python

We all interact with applications which uses text summarization. Many of those applications are for the platform which publishes articles on daily news, entertainment, sports. With our busy schedule, we prefer to read the summary of those article before we decide to jump in for reading entire article. Reading a summary help us to identify the interest area, gives a brief context of the story.

Summarization can be defined as a task of producing a concise and fluent summary while preserving key information and overall meaning.

Impact:

Summarization systems often have additional evidence they can utilize in order to specify the most important topics of document(s). For example, when summarizing blogs, there are discussions or comments coming after the blog post that are good sources of information to determine which parts of the blog are critical and interesting. In scientific paper summarization, there is a considerable amount of information such as cited papers and conference information which can be leveraged to identify important sentences in the original paper.

How text summarization works:

In general there are two types of summarization, abstractive and extractive summarization.

1.Abstractive Summarization:

Abstractive methods select words based on semantic understanding, even those words did not appear in the source documents. It aims at producing important material in a new way. They interpret and examine the text using advanced natural language techniques in order to generate a new shorter text that conveys the most critical information from the original text.

Input document → understand context → semantics → create own summary

2. Extractive Summarization:

Extractive methods attempt to summarize articles by selecting a subset of words that retain the most important points

Input document → sentences similarity → weight sentences → select sentences with higher rank.

Next, Below is our code flow to generate summarize text:-

Input article → split into sentences → remove stop words → build a similarity matrix → generate rank based on matrix → pick top N sentences for summary.

How to run:

1.Clone the repository with cmd: git clone https://github.com/Vicky1-bot/Text-summarizer-using-NLP.git

2.Setup the virtual environment and activate it.

3.Install the requirements using cmd: pip install -r requirements.txt

4.Run the application using cmd: python text-summarizer.py

well finished,you can see result in the terminal.

Let’s look at it in action.

The complete text from an article titled Microsoft Launches Intelligent Cloud Hub To Upskill Students In AI & Cloud Technologies(msft.txt)

suppose the input file is msft.txt
And the summarized text with 2 lines as an input is

Envisioned as a three-year collaborative program, Intelligent Cloud Hub will support around 100 institutions with AI infrastructure, course content and curriculum, developer support, development tools and give students access to cloud and AI services. The company will provide AI development tools and Azure AI services such as Microsoft Cognitive Services, Bot Services and Azure Machine Learning. According to Manish Prakash, Country General Manager-PS, Health and Education, Microsoft India, said, "With AI being the defining technology of our time, it is transforming lives and industry and the jobs of tomorrow will require a different skillset.

Conclusion:

As you can see, it does a pretty good job. You can further customized it to reduce to number to character instead of lines.

It is important to understand that we have used textrank as an approach to rank the sentences. TextRank does not rely on any previous training data and can work with any arbitrary piece of text. TextRank is a general purpose graph-based ranking algorithm for NLP.

Understand Text Summarization and create your own summarizer in python

Related tags

Overview

Understand Text Summarization and create your own summarizer in python

Impact:

1.Abstractive Summarization:

2. Extractive Summarization:

Next, Below is our code flow to generate summarize text:-

How to run:

Let’s look at it in action.

The complete text from an article titled Microsoft Launches Intelligent Cloud Hub To Upskill Students In AI & Cloud Technologies(msft.txt)

Conclusion:

Owner

Sreekanth M

This project consists of data analysis and data visualization (done using python)of all IPL seasons from 2008 to 2019 and answering the most asked questions about the IPL.

A BERT-based reverse-dictionary of Korean proverbs

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

Unsupervised text tokenizer focused on computational efficiency

AllenNLP integration for Shiba: Japanese CANINE model

Awesome Treasure of Transformers Models Collection

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

A minimal code for fairseq vq-wav2vec model inference.

L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources.

ADCS cert template modification and ACL enumeration

🎐 a python library for doing approximate and phonetic matching of strings.

KoBERT - Korean BERT pre-trained cased (KoBERT)

Train BPE with fastBPE, and load to Huggingface Tokenizer.

NLP Text Classification

EMNLP 2021 paper "Pre-train or Annotate? Domain Adaptation with a Constrained Budget".

中文問句產生器；使用台達電閱讀理解資料集(DRCD)

auto_code_complete is a auto word-completetion program which allows you to customize it on your need

Exploration of BERT-based models on twitter sentiment classifications