A toolkit for document-level event extraction, containing some SOTA model implementations

Last update: Dec 15, 2022

Overview

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker

Source code for ACL-IJCNLP 2021 Long paper: Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker.

Our code is based on Doc2EDAG.

0. Introduction

Document-level event extraction aims to extract events within a document. Different from sentence-level event extraction, the arguments of an event record may scatter across sentences, which requires a comprehensive understanding of the cross-sentence context. Besides, a document may express several correlated events simultaneously, and recognizing the interdependency among them is fundamental to successful extraction. To tackle the aforementioned two challenges, We propose a novel heterogeneous Graph-based Interaction Model with a Tracker (GIT). A graph-based interaction network is introduced to capture the global context for the scattered event arguments across sentences with different heterogeneous edges. We also decode event records with a Tracker module, which tracks the extracted event records, so that the interdependency among events is taken into consideration. Our approach delivers better results over the state-of-the-art methods, especially in cross-sentence events and multiple events scenarios.

Architecture
Overall Results

1. Package Description

GIT/
├─ dee/
    ├── __init__.py
    ├── base_task.py
    ├── dee_task.py
    ├── ner_task.py
    ├── dee_helper.py: data features constrcution and evaluation utils
    ├── dee_metric.py: data evaluation utils
    ├── config.py: process command arguments
    ├── dee_model.py: GIT model
    ├── ner_model.py
    ├── transformer.py: transformer module
    ├── utils.py: utils
├─ run_dee_task.py: the main entry
├─ train_multi.sh
├─ run_train.sh: script for training (including evaluation)
├─ run_eval.sh: script for evaluation
├─ Exps/: experiment outputs
├─ Data.zip
├─ Data: unzip Data.zip
├─ LICENSE
├─ README.md

2. Environments

python (3.6.9)
cuda (11.1)
Ubuntu-18.0.4 (5.4.0-73-generic)

3. Dependencies

numpy (1.19.5)
torch (1.8.1+cu111)
pytorch-pretrained-bert (0.4.0)
dgl-cu111 (0.6.1)
tensorboardX (2.2)

PS: The environments and dependencies listed here is different from what we use in our paper, so the results may be a bit different.

4. Preparation

Unzip Data.zip and you can get an Data folder, where the training/dev/test data locate.

5. Training

>> bash run_train.sh

6. Evaluation

>> bash run_eval.sh

(The evaluation is also conducted after the training)

7. License

This project is licensed under the MIT License - see the LICENSE file for details.

8. Citation

If you use this work or code, please kindly cite the following paper:

@inproceedings{xu-etal-2021-git,
    title = "Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker",
    author = "Runxin Xu  and
      Tianyu Liu  and
      Lei Li and
      Baobao Chang",
    booktitle = "The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)",
    year = "2021",
    publisher = "Association for Computational Linguistics",
}

A toolkit for document-level event extraction, containing some SOTA model implementations

Related tags

Overview

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker

0. Introduction

1. Package Description

2. Environments

3. Dependencies

4. Preparation

5. Training

6. Evaluation

7. License

8. Citation

Owner

Translates basic English sentences into the Huna language (hoo-NAH)

translate using your voice

This simple Python program calculates a love score based on your and your crush's full names in English

An A-SOUL Text Generator Based on CPM-Distill.

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

TFPNER: Exploration on the Named Entity Recognition of Token Fused with Part-of-Speech

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

Fake news detector filters - Smart filter project allow to classify the quality of information and web pages

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Installation, test and evaluation of Scribosermo speech-to-text engine

Predict the spans of toxic posts that were responsible for the toxic label of the posts

Code to reprudece NeurIPS paper: Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

Word Bot for JKLM Bomb Party

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Unsupervised intent recognition

This is an incredibly powerful calculator that is capable of many useful day-to-day functions.

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE