EMNLP 2020 - Summarizing Text on Any Aspects

Overview

Summarizing Text on Any Aspects

This repo contains preliminary code of the following paper:

Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach
Bowen Tan, Lianhui Qin, Eric P. Xing, Zhiting Hu
EMNLP 2020
[ArXiv] [Slides]

Getting Started

  • Given a document and a target aspect (e.g., a topic of interest), aspect-based abstractive summarization attempts to generate a summary with respect to the aspect.
  • In this work, we study summarizing on arbitrary aspects relevant to the document.
  • Due to the lack of supervision data, we develop a new weak supervision construction method integrating rich external knowledge sources such as ConceptNet and Wikipedia.

Requirements

Our python version is 3.8, required packages can be installed by

pip install -r requrements.txt

Our code can run on a single GTX 1080Ti GPU.

Datasets & Knowledge Sources

Weakly Supervised Dataset

Our constructed weakly supervised dataset can be downloaded by

bash data_utils/download_weaksup.sh

Downloaded data will be saved into data/weaksup/.

We also provide the code to construct it. For more details, see

MA-News Dataset

MA-News Dataset is a aspect summarization dataset constructed by (Frermann et al.) . Its aspects are restricted to only 6 coarsegrained topics. We use MA-News dataset for our automatic evaluation. Scripts to make MA-News is here.

A JSON version processed by us can be download by

bash data_utils/download_manews.sh

Downloaded data will be saved into data/manews/.

Knowledge Graph - ConceptNet

ConceptNet is a huge multilingual commonsense knowledge graph. We extract an English subset that can be downloaded by

bash data_utils/download_concept_net.sh

Knowledge Base - Wikipedia

Wikipedia is an encyclopaedic knowledge base. We use its python API to access it online, so make sure your web connection is good when running our code.

Weakly Supervised Model

Train

Run this command to finetune a weakly supervised model from pretrained BART model (Lewis et al.).

python finetune.py --dataset_name weaksup --train_docs 100000 --n_epochs 1

Training logs and checkpoints will be saved into logs/weaksup/docs100000/

The training takes ~48h on a single GTX 1080Ti GPU. You may want to directly download the training log and the trained model here.

Generation

Run this command to generate on MA-News test set with the weakly supervised model.

python generate.py --log_path logs/weaksup/docs100000/

Source texts, target texts, generated texts will be saved as test.source, test.gold, and test.hypo respectively, into the log dir: logs/weaksup/docs100000/.

Evaluation

To run evaluation, make sure you have installed java and files2rouge on your device.

First, download stanford nlp by

python data_utils/download_stanford_core_nlp.py

and run

bash evaluate.sh logs/weaksup/docs100000/

to get rouge scores. Results will be saved in logs/weaksup/docs100000/rouge_scores.txt.

Finetune with MA-News Training Data

Baseline

Run this command to finetune a BART model with 1K MA-News training data examples.

python finetune.py --dataset_name manews --train_docs 1000 --wiki_sup False
python generate.py --log_path logs/manews/docs1000/ --wiki_sup False
bash evaluate.sh logs/manews/docs1000/

Results will be saved in logs/manews/docs1000/.

+ Weak Supervision

Run this command to finetune with 1K MA-News training data examples starting with our weakly supervised model.

python finetune.py --dataset_name manews --train_docs 1000 --pretrained_ckpt logs/weaksup/docs100000/best_model.ckpt
python generate.py --log_path logs/manews_plus/docs1000/
bash evaluate.sh logs/manews_plus/docs1000/

Results will be saved in logs/manews_plus/docs1000/.

Results

Results on MA-News dataset are as below (same setting as paper Table 2).

All the detailed logs, including training log, generated texts, and rouge scores, are available here.

(Note: The result numbers may be slightly different from those in the paper due to slightly different implementation details and random seeds, while the improvements over comparison methods are consistent.)

Model ROUGE-1 ROUGE-2 ROUGE-L
Weak-Sup Only 28.41 10.18 25.34
MA-News-Sup 1K 24.34 8.62 22.40
MA-News-Sup 1K + Weak-Sup 34.10 14.64 31.45
MA-News-Sup 3K 26.38 10.09 24.37
MA-News-Sup 3K + Weak-Sup 37.40 16.87 34.51
MA-News-Sup 10K 38.71 18.02 35.78
MA-News-Sup 10K + Weak-Sup 39.92 18.87 36.98

Demo

We provide a demo on a real news on Feb. 2021. (see demo_input.json).

To run the demo, download our trained model here, and run the command below

python demo.py --ckpt_path logs/weaksup/docs100000/best_model.ckpt
Owner
Bowen Tan
Bowen Tan
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

DatasetGAN This is the official code and data release for: DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort Yuxuan Zhang*, Huan Li

302 Jan 05, 2023
[NeurIPS '21] Adversarial Attacks on Graph Classification via Bayesian Optimisation (GRABNEL)

Adversarial Attacks on Graph Classification via Bayesian Optimisation @ NeurIPS 2021 This repository contains the official implementation of GRABNEL,

Xingchen Wan 12 Dec 23, 2022
MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

MemStream Implementation of MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift . Siddharth Bhatia, Arjit Jain, Shivi

Stream-AD 61 Dec 02, 2022
Registration Loss Learning for Deep Probabilistic Point Set Registration

RLLReg This repository contains a Pytorch implementation of the point set registration method RLLReg. Details about the method can be found in the 3DV

Felix Järemo Lawin 35 Nov 02, 2022
Pytorch Implementation of Various Point Transformers

Pytorch Implementation of Various Point Transformers Recently, various methods applied transformers to point clouds: PCT: Point Cloud Transformer (Men

Neil You 434 Dec 30, 2022
Automatically erase objects in the video, such as logo, text, etc.

Video-Auto-Wipe Read English Introduction:Here   本人不定期的基于生成技术制作一些好玩有趣的算法模型,这次带来的作品是“视频擦除”方向的应用模型,它实现的功能是自动感知到视频中我们不想看见的部分(譬如广告、水印、字幕、图标等等)然后进行擦除。由于图标擦

seeprettyface.com 141 Dec 26, 2022
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Tensor2Tensor Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and ac

12.9k Jan 09, 2023
[CVPR 2021] Official PyTorch Implementation for "Iterative Filter Adaptive Network for Single Image Defocus Deblurring"

IFAN: Iterative Filter Adaptive Network for Single Image Defocus Deblurring Checkout for the demo (GUI/Google Colab)! The GUI version might occasional

Junyong Lee 173 Dec 30, 2022
realsense d400 -> jpg + csv

Realsense-capture realsense d400 - jpg + csv Requirements RealSense sdk : Installation Python3 pyrealsense2 (RealSense SDK) Numpy OpenCV Tkinter Run

Ar-Ray 2 Mar 22, 2022
Probabilistic Tensor Decomposition of Neural Population Spiking Activity

Probabilistic Tensor Decomposition of Neural Population Spiking Activity Matlab (recommended) and Python (in developement) implementations of Soulat e

Hugo Soulat 6 Nov 30, 2022
T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time The first Lidar-only odometry framework with high performance based on tr

Pengwei Zhou 183 Dec 01, 2022
A collection of awesome resources image-to-image translation.

awesome image-to-image translation A collection of resources on image-to-image translation. Contributing If you think I have missed out on something (

876 Dec 28, 2022
Official implementation of "SinIR: Efficient General Image Manipulation with Single Image Reconstruction" (ICML 2021)

SinIR (Official Implementation) Requirements To install requirements: pip install -r requirements.txt We used Python 3.7.4 and f-strings which are in

47 Oct 11, 2022
PyTorch Implementation for AAAI'21 "Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection"

UMS for Multi-turn Response Selection Implements the model described in the following paper Do Response Selection Models Really Know What's Next? Utte

Taesun Whang 47 Nov 22, 2022
Doing the asl sign language classification on static images using graph neural networks.

SignLangGNN When GNNs 💜 MediaPipe. This is a starter project where I tried to implement some traditional image classification problem i.e. the ASL si

10 Nov 09, 2022
TorchIO is a Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.

Medical image preprocessing and augmentation toolkit for deep learning. Part of the PyTorch Ecosystem.

Fernando Pérez-García 1.6k Jan 06, 2023
Code and project page for ICCV 2021 paper "DisUnknown: Distilling Unknown Factors for Disentanglement Learning"

DisUnknown: Distilling Unknown Factors for Disentanglement Learning See introduction on our project page Requirements PyTorch = 1.8.0 torch.linalg.ei

Sitao Xiang 24 May 16, 2022
[SDM 2022] Towards Similarity-Aware Time-Series Classification

SimTSC This is the PyTorch implementation of SDM2022 paper Towards Similarity-Aware Time-Series Classification. We propose Similarity-Aware Time-Serie

Daochen Zha 49 Dec 27, 2022
Easily benchmark PyTorch model FLOPs, latency, throughput, max allocated memory and energy consumption

⏱ pytorch-benchmark Easily benchmark model inference FLOPs, latency, throughput, max allocated memory and energy consumption Install pip install pytor

Lukas Hedegaard 21 Dec 22, 2022
Time series annotation library.

CrowdCurio Time Series Annotator Library The CrowdCurio Time Series Annotation Library implements classification tasks for time series. Features Suppo

CrowdCurio 51 Sep 15, 2022