EMNLP 2020 - Summarizing Text on Any Aspects

Overview

Summarizing Text on Any Aspects

This repo contains preliminary code of the following paper:

Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach
Bowen Tan, Lianhui Qin, Eric P. Xing, Zhiting Hu
EMNLP 2020
[ArXiv] [Slides]

Getting Started

  • Given a document and a target aspect (e.g., a topic of interest), aspect-based abstractive summarization attempts to generate a summary with respect to the aspect.
  • In this work, we study summarizing on arbitrary aspects relevant to the document.
  • Due to the lack of supervision data, we develop a new weak supervision construction method integrating rich external knowledge sources such as ConceptNet and Wikipedia.

Requirements

Our python version is 3.8, required packages can be installed by

pip install -r requrements.txt

Our code can run on a single GTX 1080Ti GPU.

Datasets & Knowledge Sources

Weakly Supervised Dataset

Our constructed weakly supervised dataset can be downloaded by

bash data_utils/download_weaksup.sh

Downloaded data will be saved into data/weaksup/.

We also provide the code to construct it. For more details, see

MA-News Dataset

MA-News Dataset is a aspect summarization dataset constructed by (Frermann et al.) . Its aspects are restricted to only 6 coarsegrained topics. We use MA-News dataset for our automatic evaluation. Scripts to make MA-News is here.

A JSON version processed by us can be download by

bash data_utils/download_manews.sh

Downloaded data will be saved into data/manews/.

Knowledge Graph - ConceptNet

ConceptNet is a huge multilingual commonsense knowledge graph. We extract an English subset that can be downloaded by

bash data_utils/download_concept_net.sh

Knowledge Base - Wikipedia

Wikipedia is an encyclopaedic knowledge base. We use its python API to access it online, so make sure your web connection is good when running our code.

Weakly Supervised Model

Train

Run this command to finetune a weakly supervised model from pretrained BART model (Lewis et al.).

python finetune.py --dataset_name weaksup --train_docs 100000 --n_epochs 1

Training logs and checkpoints will be saved into logs/weaksup/docs100000/

The training takes ~48h on a single GTX 1080Ti GPU. You may want to directly download the training log and the trained model here.

Generation

Run this command to generate on MA-News test set with the weakly supervised model.

python generate.py --log_path logs/weaksup/docs100000/

Source texts, target texts, generated texts will be saved as test.source, test.gold, and test.hypo respectively, into the log dir: logs/weaksup/docs100000/.

Evaluation

To run evaluation, make sure you have installed java and files2rouge on your device.

First, download stanford nlp by

python data_utils/download_stanford_core_nlp.py

and run

bash evaluate.sh logs/weaksup/docs100000/

to get rouge scores. Results will be saved in logs/weaksup/docs100000/rouge_scores.txt.

Finetune with MA-News Training Data

Baseline

Run this command to finetune a BART model with 1K MA-News training data examples.

python finetune.py --dataset_name manews --train_docs 1000 --wiki_sup False
python generate.py --log_path logs/manews/docs1000/ --wiki_sup False
bash evaluate.sh logs/manews/docs1000/

Results will be saved in logs/manews/docs1000/.

+ Weak Supervision

Run this command to finetune with 1K MA-News training data examples starting with our weakly supervised model.

python finetune.py --dataset_name manews --train_docs 1000 --pretrained_ckpt logs/weaksup/docs100000/best_model.ckpt
python generate.py --log_path logs/manews_plus/docs1000/
bash evaluate.sh logs/manews_plus/docs1000/

Results will be saved in logs/manews_plus/docs1000/.

Results

Results on MA-News dataset are as below (same setting as paper Table 2).

All the detailed logs, including training log, generated texts, and rouge scores, are available here.

(Note: The result numbers may be slightly different from those in the paper due to slightly different implementation details and random seeds, while the improvements over comparison methods are consistent.)

Model ROUGE-1 ROUGE-2 ROUGE-L
Weak-Sup Only 28.41 10.18 25.34
MA-News-Sup 1K 24.34 8.62 22.40
MA-News-Sup 1K + Weak-Sup 34.10 14.64 31.45
MA-News-Sup 3K 26.38 10.09 24.37
MA-News-Sup 3K + Weak-Sup 37.40 16.87 34.51
MA-News-Sup 10K 38.71 18.02 35.78
MA-News-Sup 10K + Weak-Sup 39.92 18.87 36.98

Demo

We provide a demo on a real news on Feb. 2021. (see demo_input.json).

To run the demo, download our trained model here, and run the command below

python demo.py --ckpt_path logs/weaksup/docs100000/best_model.ckpt
Owner
Bowen Tan
Bowen Tan
E-Ink Magic Calendar that automatically syncs to Google Calendar and runs off a battery powered Raspberry Pi Zero

MagInkCal This repo contains the code needed to drive an E-Ink Magic Calendar that uses a battery powered (PiSugar2) Raspberry Pi Zero WH to retrieve

2.8k Dec 28, 2022
Differentiable rasterization applied to 3D model simplification tasks

nvdiffmodeling Differentiable rasterization applied to 3D model simplification tasks, as described in the paper: Appearance-Driven Automatic 3D Model

NVIDIA Research Projects 336 Dec 30, 2022
A curated list of Machine Learning and Deep Learning tutorials in Jupyter Notebook format ready to run in Google Colaboratory

Awesome Machine Learning Jupyter Notebooks for Google Colaboratory A curated list of Machine Learning and Deep Learning tutorials in Jupyter Notebook

Carlos Toxtli 245 Jan 01, 2023
Implementation of Fast Transformer in Pytorch

Fast Transformer - Pytorch Implementation of Fast Transformer in Pytorch. This only work as an encoder. Yannic video AI Epiphany Install $ pip install

Phil Wang 167 Dec 27, 2022
FedTorch is an open-source Python package for distributed and federated training of machine learning models using PyTorch distributed API

FedTorch is a generic repository for benchmarking different federated and distributed learning algorithms using PyTorch Distributed API.

Machine Learning and Optimization Lab @PennState 136 Dec 23, 2022
pytorch, hand(object) detect ,yolo v5,手检测

YOLO V5 物体检测,包括手部检测。 项目介绍 手部检测 手部检测示例如下 : 视频示例: 项目配置 作者开发环境: Python 3.7 PyTorch = 1.5.1 数据集 手部检测数据集 该项目数据集采用 TV-Hand 和 COCO-Hand (COCO-Hand-Big 部分) 进

Eric.Lee 11 Dec 20, 2022
The repository includes the code for training cell counting applications. (Keras + Tensorflow)

cell_counting_v2 The repository includes the code for training cell counting applications. (Keras + Tensorflow) Dataset can be downloaded here : http:

Weidi 113 Oct 06, 2022
RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.

Randomised controlled trial abstract result tabulator RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into

2 Sep 16, 2022
This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

Introduction: X-Ray Report Generation This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports". O

no name 36 Dec 16, 2022
An SE(3)-invariant autoencoder for generating the periodic structure of materials

Crystal Diffusion Variational AutoEncoder This software implementes Crystal Diffusion Variational AutoEncoder (CDVAE), which generates the periodic st

Tian Xie 94 Dec 10, 2022
Reference implementation for Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Diffusion Probabilistic Models This repository provides a reference implementation of the method described in the paper: Deep Unsupervised Learning us

Jascha Sohl-Dickstein 238 Jan 02, 2023
Melanoma Skin Cancer Detection using Convolutional Neural Networks and Transfer Learning🕵🏻‍♂️

This is a Kaggle competition in which we have to identify if the given lesion image is malignant or not for Melanoma which is a type of skin cancer.

Vipul Shinde 1 Jan 27, 2022
Dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city.

CaFM-pytorch ICCV ACCEPT Introduction of dataset VSD4K Our dataset VSD4K includes 6 popular categories: game, sport, dance, vlog, interview and city.

96 Jul 05, 2022
Official Pytorch implementation of ICLR 2018 paper Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge.

Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge: Official Pytorch implementation of ICLR 2018 paper Deep Learning for Phy

emmanuel 47 Nov 06, 2022
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Documentation | FAQ | Release Notes | Roadmap | MACE Model Zoo | Demo | Join Us | 中文 Mobile AI Compute Engine (or MACE for short) is a deep learning i

Xiaomi 4.7k Dec 29, 2022
Just Randoms Cats with python

Random-Cat Just Randoms Cats with python.

OriCode 2 Dec 21, 2021
EPSANet:An Efficient Pyramid Split Attention Block on Convolutional Neural Network

EPSANet:An Efficient Pyramid Split Attention Block on Convolutional Neural Network This repo contains the official Pytorch implementaion code and conf

Hu Zhang 175 Jan 07, 2023
Project code for weakly supervised 3D object detectors using wide-baseline multi-view traffic camera data: WIBAM.

WIBAM (Work in progress) Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data 3D object dete

Matthew Howe 10 Aug 24, 2022
Minecraft Hack Detection With Python

Minecraft Hack Detection An attempt to try and use crowd sourced replays to find

Kuleen Sasse 3 Mar 26, 2022
Implementation of popular SOTA self-supervised learning algorithms as Fastai Callbacks.

Self Supervised Learning with Fastai Implementation of popular SOTA self-supervised learning algorithms as Fastai Callbacks. Install pip install self-

Kerem Turgutlu 276 Dec 23, 2022