Code repo for EMNLP21 paper "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation"

Related tags

Deep Learningdeepex
Overview

Zero-Shot Information Extraction as a Unified Text-to-Triple Translation

Source code repo for paper Zero-Shot Information Extraction as a Unified Text-to-Triple Translation, EMNLP 2021.

Installation

git clone --recursive [email protected]:cgraywang/deepex.git
cd ./deepex
conda create --name deepex python=3.7 -y
conda activate deepex
pip install -r requirements.txt
pip install -e .

Requires PyTorch version 1.5.1 or above with CUDA. PyTorch 1.7.1 with CUDA 10.1 is tested. Please refer to https://pytorch.org/get-started/locally/ for installing PyTorch.

Dataset Preparation

Relation Classification

FewRel

You can add --prepare-rc-dataset argument when running the scripts in this section, which would allow the script to automatically handle the preparation of FewRel dataset.

Or, you could manually download and prepare the FewRel dataset using the following script:

bash scripts/rc/prep_FewRel.sh

The processed data will be stored at data/FewRel/data.jsonl.

TACRED

TACRED is licensed under LDC, please first download TACRED dataset from link. The downloaded file should be named as tacred_LDC2018T24.tgz.

After downloading and correctly naming the tacred .tgz data file, you can add --prepare-rc-dataset argument when running the scripts in this section, which would allow the script to automatically handle the preparation of TACRED dataset.

Or, you could manually download and prepare the TACRED dataset using the following script:

bash scripts/rc/prep_TACRED.sh

The processed data will be stored at data/TACRED/data.jsonl.

Scripts for Reproducing Results

This section contains the scripts for running the tasks with default setting (e.g.: using model bert-large-cased, using 8 CUDA devices with per-device batch size equal to 4).

To modify the settings, please checkout this section.

Open Information Extraction

bash tasks/OIE_2016.sh
bash tasks/PENN.sh
bash tasks/WEB.sh
bash tasks/NYT.sh

Relation Classification

bash tasks/FewRel.sh
bash tasks/TACRED.sh

Arguments

General script:

python scripts/manager.py --task=<task_name> <other_args>

The default setting is:

python scripts/manager.py --task=<task_name> --model="bert-large-cased" --beam-size=6
                          --max-distance=2048 --batch-size-per-device=4 --stage=0
                          --cuda=0,1,2,3,4,5,6,7

All tasks are already implemented as above .sh files in tasks/, using the default arguments.

The following are the most important command-line arguments for the scripts/manager.py script:

  • --task: The task to be run, supported tasks are OIE_2016, WEB, NYT, PENN, FewRel and TACRED.
  • --model: The pre-trained model type to be used for generating attention matrices to perform beam search on, supported models are bert-base-cased and bert-large-cased.
  • --beam-size: The beam size during beam search.
  • --batch-size-per-device: The batch size on a single device.
  • --stage: Run task starting from an intermediate stage:
    • --stage=0: data preparation and beam-search
    • --stage=1: post processing
    • --stage=2: ranking
    • --stage=3: evaluation
  • --prepare-rc-dataset: If true, automatically run the relation classification dataset preparation scripts. Notice that this argument should be turned on only for relation classification tasks (i.e.: FewRel and TACRED).
  • --cuda: Specify CUDA gpu devices.

Run python scripts/manager.py -h for the full list.

Results

NOTE

We are able to obtain improved or same results compared to the paper's results. We will release the code and datasets for factual probe soon!

Related Work

We implement an extended version of the beam search algorithm proposed in Language Models are Open Knowledge Graphs in src/deepex/model/kgm.py.

Citation

@inproceedings{wang-etal-2021-deepex,
    title = "Zero-Shot Information Extraction as a Unified Text-to-Triple Translation",
    author = "Chenguang Wang and Xiao Liu and Zui Chen and Haoyun Hong and Jie Tang and Dawn Song",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    year = "2021",
    publisher = "Association for Computational Linguistics"
}

@article{wang-etal-2020-language,
    title = "Language Models are Open Knowledge Graphs",
    author = "Chenguang Wang and Xiao Liu and Dawn Song",
    journal = "arXiv preprint arXiv:2010.11967",
    year = "2020"
}
A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

DeepFilterNet A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering. libDF contains Rust code used for dat

Hendrik Schröter 292 Dec 25, 2022
A trashy useless Latin programming language written in python.

Codigum! The first programming langage in latin! (please keep your eyes closed when if you read the source code) It is pretty useless though. Document

Bic 2 Oct 25, 2021
The repository for the paper "When Do You Need Billions of Words of Pretraining Data?"

pretraining-learning-curves This is the repository for the paper When Do You Need Billions of Words of Pretraining Data? Edge Probing We use jiant1 fo

ML² AT CILVR 19 Nov 25, 2022
One-line your code easily but still with the fun of doing so!

One-liner-iser One-line your code easily but still with the fun of doing so! Have YOU ever wanted to write one-line Python code, but don't have the sa

5 May 04, 2022
Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai

Coursera-deep-learning-specialization - Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai: (i) Neural Networks an

Aman Chadha 1.7k Jan 08, 2023
Prior-Guided Multi-View 3D Head Reconstruction

Prior-Guided Head MVS This repository includes some reconstruction results of our IEEE TMM 2021 paper, Prior-Guided Multi-View 3D Head Reconstruction.

11 Aug 17, 2022
Equivariant Imaging: Learning Beyond the Range Space

Equivariant Imaging: Learning Beyond the Range Space Equivariant Imaging: Learning Beyond the Range Space Dongdong Chen, Julián Tachella, Mike E. Davi

Dongdong Chen 46 Jan 01, 2023
[WWW 2021] Source code for "Graph Contrastive Learning with Adaptive Augmentation"

GCA Source code for Graph Contrastive Learning with Adaptive Augmentation (WWW 2021) For example, to run GCA-Degree under WikiCS, execute: python trai

Big Data and Multi-modal Computing Group, CRIPAC 97 Jan 07, 2023
Multistream CNN for Robust Acoustic Modeling

Multistream Convolutional Neural Network (CNN) A multistream CNN is a novel neural network architecture for robust acoustic modeling in speech recogni

ASAPP Research 37 Sep 21, 2022
This is Unofficial Repo. Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection (CVPR 2021)

Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection This is a PyTorch implementation of the LipForensics paper. This is an U

Minha Kim 2 May 11, 2022
Solution to the Weather4cast 2021 challenge

This code was used for the entry by the team "antfugue" for the Weather4cast 2021 Challenge. Below, you can find the instructions for generating predi

Jussi Leinonen 13 Jan 03, 2023
CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation

CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation (CVPR 2021, oral presentation) CoCosNet v2: Full-Resolution Correspondence

Microsoft 308 Dec 07, 2022
[ICLR2021] Unlearnable Examples: Making Personal Data Unexploitable

Unlearnable Examples Code for ICLR2021 Spotlight Paper "Unlearnable Examples: Making Personal Data Unexploitable " by Hanxun Huang, Xingjun Ma, Sarah

Hanxun Huang 98 Dec 07, 2022
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model About This repository contains the code to replicate the syn

Haruka Kiyohara 12 Dec 07, 2022
Code implementing "Improving Deep Learning Interpretability by Saliency Guided Training"

Saliency Guided Training Code implementing "Improving Deep Learning Interpretability by Saliency Guided Training" by Aya Abdelsalam Ismail, Hector Cor

8 Sep 22, 2022
EdMIPS: Rethinking Differentiable Search for Mixed-Precision Neural Networks

EdMIPS is an efficient algorithm to search the optimal mixed-precision neural network directly without proxy task on ImageNet given computation budgets. It can be applied to many popular network arch

Zhaowei Cai 47 Dec 30, 2022
Finding an Unsupervised Image Segmenter in each of your Deep Generative Models

Finding an Unsupervised Image Segmenter in each of your Deep Generative Models Description Recent research has shown that numerous human-interpretable

Luke Melas-Kyriazi 61 Oct 17, 2022
Video-face-extractor - Video face extractor with Python

Python face extractor Setup Create the srcvideos and faces directories Put your

2 Feb 03, 2022
Linear algebra python - Number of operations and problems in Linear Algebra and Numerical Linear Algebra

Linear algebra in python Number of operations and problems in Linear Algebra and

Alireza 5 Oct 09, 2022
[NeurIPS-2020] Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.

Self-paced Contrastive Learning (SpCL) The official repository for Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID

Yixiao Ge 286 Dec 21, 2022