Utilizing RBERT model for KLUE Relation Extraction task

Last update: Nov 15, 2022

Related tags

Text Data & NLP KLUE-RBERT

Overview

RBERT for Relation Extraction task for KLUE

Project Description

Relation Extraction task is one of the task of Korean Language Understanding Evaluation(KLUE) Benchmark.
Relation extraction can be defined as multiclass classification task for relationship between subject entity and object entity.
Classes are such as no_relation, per:employee_of, org:founded_by... totaling 30 labels.
This repo contains custom fine-tuning method utilizing monologg's R-BERT Implementation.
Custom punctuations with Pororo NER has been added to the dataset prior to the model's training.
If you want to refer to the experimentation note such as punctuation method of the entity, please refer to the blog post

Arguments Usage

Argument	type	Default	Explanation
batch_size	int	40	batch size for training and inferece
num_folds	int	5	number of fold for Stratified KFold
num_train_epochs	int	5	number of epochs for training
loss	str	focalloss	loss function
gamma	float	1.0	focalloss's gamma value
optimizer	str	adamp	optimizer for training
scheduler	str	get_cosine_schedule_with_warmup	learning rate scheduler
learning_rate	float	0.00005	initial learning rate
weight_decay	float	0.01	Loss function's weight decay, preventing overfit
warmup_step	int	500
debug	bool	false	debug with CPU device for better error representation
dropout_rate	float	0.1
save_steps	int	100	number of steps for saving the model
evaluation_steps	int	100	number of step until the evaluation
metric_for_best_model	str	eval/loss	the metric for determining which is the best model
load_best_model_at_end	bool	True

References

Authorship

Hardware

GPU : Tesla V100 32GB

Owner

snoop2head

break, compose, display

snoop2head

GitHub Repository https://snoop2head.github.io/Relation-Extraction/

A python package for deep multilingual punctuation prediction.

This python library predicts the punctuation of English, Italian, French and German texts. We developed it to restore the punctuation of transcribed spoken language.

27 Dec 22, 2022

BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents

BROS (BERT Relying On Spatiality) is a pre-trained language model focusing on text and layout for better key information extraction from documents. Given the OCR results of the document image, which

94 Dec 30, 2022

A script that automatically creates a branch name using google translation api and jira api

About google translation api와 jira api을 사용하여 자동으로 브랜치 이름을 만들어주는 스크립트 Setup 환경변수에 다음 3가지를 등록해야 한다. JIRA_USER : JIRA email (ex: hyunwook.kim 2 Dec 20, 2021

Dual languaged (rus+eng) tool for packing and unpacking archives of Silky Engine.

SilkyArcTool English Dual languaged (rus+eng) GUI tool for packing and unpacking archives of Silky Engine. It is not the same arc as used in Ai6WIN. I

5 Sep 15, 2022

Natural Language Processing Best Practices & Examples

NLP Best Practices In recent years, natural language processing (NLP) has seen quick growth in quality and usability, and this has helped to drive bus

6.1k Dec 31, 2022

RuCLIP-SB (Russian Contrastive Language–Image Pretraining SWIN-BERT) is a multimodal model for obtaining images and text similarities and rearranging captions and pictures. Unlike other versions of the model we use BERT for text encoder and SWIN transformer for image encoder.

ruCLIP-SB RuCLIP-SB (Russian Contrastive Language–Image Pretraining SWIN-BERT) is a multimodal model for obtaining images and text similarities and re

5 Apr 13, 2022

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

gpt-2-simple A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifical

3.1k Jan 07, 2023

Question answering app is used to answer for a user given question from user given text.

Question answering app is used to answer for a user given question from user given text.It is created using HuggingFace's transformer pipeline and streamlit python packages.

3 Apr 05, 2022

Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

2k Dec 27, 2022

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products.

2 Jan 12, 2022

Dust model dichotomous performance analysis

Dust-model-dichotomous-performance-analysis Using a collated dataset of 90,000 dust point source observations from 9 drylands studies from around the

1 Dec 17, 2021

An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

pl_prompt_sst An example project using OpenPrompt under the framework of pytorch-lightning for a training prompt-based text classification model on SS

5 Oct 21, 2022

Resources for "Natural Language Processing" Coursera course.

Natural Language Processing course resources This github contains practical assignments for Natural Language Processing course by Higher School of Eco

1.1k Jan 01, 2023

Journey is a NLP-Powered Developer assistant

Journey Journey is a NLP-Powered Developer assistant Using on the powerful Natural Language Processing library Mindmeld, this projects aims to assist

21 Dec 11, 2022

simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.

Quickly train T5 models in just 3 lines of code + ONNX support simpleT5 is built on top of PyTorch-lightning ⚡️ and Transformers 🤗 that lets you quic

220 Dec 30, 2022

TTS is a library for advanced Text-to-Speech generation.

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pretra

6.5k Jan 08, 2023

code for modular summarization work published in ACL2021 by Krishna et al

This repository contains the code for running modular summarization pipelines as described in the publication Krishna K, Khosla K, Bigham J, Lipton ZC

21 Nov 24, 2022

BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions

BERTopic BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable

3.6k Jan 07, 2023

EMNLP 2021 paper "Pre-train or Annotate? Domain Adaptation with a Constrained Budget".

Pre-train or Annotate? Domain Adaptation with a Constrained Budget This repo contains code and data associated with EMNLP 2021 paper "Pre-train or Ann

8 Dec 17, 2021

An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

6k Dec 31, 2022