Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Overview

Realistic Few-Shot Relation Extraction

This repository contains code to reproduce the results in the paper "Towards Realistic Few-Shot Relation Extraction" to appear in The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021). This code is not intended to be modified or reused. It is a fork of an existing FewRel repository with some modifications.

Fine-tuning

The following command is to fine-tune a pre-trained model on a training dataset complying with the FewRel's format (see the Dataset section below).

python -m fewrel.fewrel_eval \
  --train train_wiki \
  --test val_wiki \
  --encoder {"cnn", "bert", "roberta", "luke"} \
  --pool {"cls", "cat_entity_reps"} \
  --data_root data/fewrel \
  --pretrain_ckpt {pretrained_model_path} \
  --train_iter 10000 \
  --val_iter 1000 \
  --val_step 2000 \
  --test_iter 2000

The above command will dump the fine-tuned model under ./checkpoint. The following command can be used to get the overall accuracy for the fine-tuned model.

Overall accuracy

python -m fewrel.fewrel_eval \
  --only_test \
  --test val_wiki \
  --encoder {"cnn", "bert", "roberta", "luke"} \
  --pool {"cls", "cat_entity_reps"} \
  --data_root data/fewrel \
  --pretrain_ckpt {pretrained_model_path} \ # needed for getting model config
  --load_ckpt {trained_checkpoint_path} \
  --test_iter 2000

[email protected] for individual relations

Precision at 50 can be calculated using the following command

python -m fewrel.alt_eval \
  --test {test_file_name_without_extension} \ # e.g., tacred_org 
  --encoder {"cnn", "bert", "roberta", "luke"} \
  --pool {"cls", "cat_entity_reps"} \
  --data_root {path_to_data_folder} \
  --pretrain_ckpt {pretrained_model_path} \ # needed for getting model config
  --load_ckpt {trained_checkpoint_path}

Pre-trained models

In this work, several encoders are experimented with including CNN, BERT, SpanBERT, RoBERTa-base, RoBERTa-large, and LUKE-base. Most pre-trained models can be downloaded from Hugging Face Transformers, and LUKE-base can be downloaded from its original GitHub repository.

Note: the original LUKE code depends on an older version of HuggingFace Transformers, which is not compatible with the version used in this repository. To experiment with LUKE, please run script ./checkout_out_luke.sh. This will first clone the original LUKE repository, apply the necessary changes to make luke compatible with this repo, and move the LUKE module to the correct place to make sure the code runs correctly.

Dataset

The original FewRel dataset has already be contained in the github repo (here)[./data/fewrel]. To convert other dataset (e.g., TACRED) to the FewRel format, one could use ./scripts/prep_more_data.py.

./scripts/select_rel.py is a script to augment an existing dataset with relations from another dataset. For example, to add a list of relations from dataset source.json to destination.json and dump the merged dataset to a file output.json, one can use the following command:

python scripts/select_rel.py add_rel \
  --src source.json \
  --dst destination.json \
  --output output.json \
  --rels {relations_delimitated_by_space}
Owner
Bloomberg
Bloomberg
End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Image captioning End-to-end image captioning with EfficientNet-b3 + LSTM with Attention Model is seq2seq model. In the encoder pretrained EfficientNet

2 Feb 10, 2022
🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Hugging Face 15k Jan 02, 2023
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

ALBERT ***************New March 28, 2020 *************** Add a colab tutorial to run fine-tuning for GLUE datasets. ***************New January 7, 2020

Google Research 3k Dec 26, 2022
自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器

ja-timex 自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器 概要 ja-timex は、現代日本語で書かれた自然文に含まれる時間情報表現を抽出しTIMEX3と呼ばれるアノテーション仕様に変換することで、プログラムが利用できるような形に規格化するルールベースの解析器です。

Yuki Okuda 116 Nov 09, 2022
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Wav2Vec2 STT Python Beta Software Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 mode

David Zurow 22 Dec 29, 2022
An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

Neural Networks and Deep Learning lab, MIPT 6k Dec 30, 2022
Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine

Semantic search through Wikipedia with the Weaviate vector search engine Weaviate is an open source vector search engine with build-in vectorization a

SeMI Technologies 191 Dec 26, 2022
Automatically search Stack Overflow for the command you want to run

stackshell Automatically search Stack Overflow (and other Stack Exchange sites) for the command you want to ru Use the up and down arrows to change be

circuit10 22 Oct 27, 2021
Opal-lang - A WIP programming language based on Python

thanks to aphitorite for the beautiful logo! opal opal is a WIP transcompiled pr

3 Nov 04, 2022
CMeEE 数据集医学实体抽取

医学实体抽取_GlobalPointer_torch 介绍 思想来自于苏神 GlobalPointer,原始版本是基于keras实现的,模型结构实现参考现有 pytorch 复现代码【感谢!】,基于torch百分百复现苏神原始效果。 数据集 中文医学命名实体数据集 点这里申请,很简单,共包含九类医学

85 Dec 28, 2022
An implementation of WaveNet with fast generation

pytorch-wavenet This is an implementation of the WaveNet architecture, as described in the original paper. Features Automatic creation of a dataset (t

Vincent Herrmann 858 Dec 27, 2022
Applied Natural Language Processing in the Enterprise - An O'Reilly Media Publication

Applied Natural Language Processing in the Enterprise This is the companion repo for Applied Natural Language Processing in the Enterprise, an O'Reill

Applied Natural Language Processing in the Enterprise 95 Jan 05, 2023
Huggingface Transformers + Adapters = ❤️

adapter-transformers A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models adapter-transformers is an extension of

AdapterHub 1.2k Jan 09, 2023
Just a Basic like Language for Zeno INC

zeno-basic-language Just a Basic like Language for Zeno INC This is written in 100% python. this is basic language like language. so its not for big p

Voidy Devleoper 1 Dec 18, 2021
Text Analysis & Topic Extraction on Android App user reviews

AndroidApp_TextAnalysis Hi, there! This is code archive for Text Analysis and Topic Extraction from user_reviews of Android App. Dataset Source : http

Fitrie Ratnasari 1 Feb 14, 2022
Chatbot for the Chatango messaging platform

BroiestBot The baddest bot in the game right now. Uses the ch.py framework for joining Chantango rooms and responding to user messages. Commands If a

Todd Birchard 3 Jan 17, 2022
Finding Label and Model Errors in Perception Data With Learned Observation Assertions

Finding Label and Model Errors in Perception Data With Learned Observation Assertions This is the project page for Finding Label and Model Errors in P

Stanford Future Data Systems 17 Oct 14, 2022
Problem: Given a nepali news find the category of the news

Classification of category of nepali news catorgory using different algorithms Problem: Multiclass Classification Approaches: TFIDF for vectorization

pudasainishushant 2 Jan 09, 2022
Python module (C extension and plain python) implementing Aho-Corasick algorithm

pyahocorasick pyahocorasick is a fast and memory efficient library for exact or approximate multi-pattern string search meaning that you can find mult

Wojciech Muła 763 Dec 27, 2022
Code for the paper "Flexible Generation of Natural Language Deductions"

Code for the paper "Flexible Generation of Natural Language Deductions"

Kaj Bostrom 12 Nov 11, 2022