KoRean based ELECTRA pre-trained models (KR-ELECTRA) for Tensorflow and PyTorch

Overview

KoRean based ELECTRA (KR-ELECTRA)

This is a release of a Korean-specific ELECTRA model with comparable or better performances developed by the Computational Linguistics Lab at Seoul National University. Our model shows remarkable performances on tasks related to informal texts such as review documents, while still showing comparable results on other kinds of tasks.

Released Model

We pre-trained our KR-ELECTRA model following a base-scale model of ELECTRA. We trained the model based on Tensorflow-v1 using a v3-8 TPU of Google Cloud Platform.

Model Details

We followed the training parameters of the base-scale model of ELECTRA.

Hyperparameters
model # of layers embedding size hidden size # of heads
Discriminator 12 768 768 12
Generator 12 768 256 4
Pretraining
batch size train steps learning rates max sequence length generator size
256 700000 2e-4 128 0.33333

Training Dataset

34GB Korean texts including Wikipedia documents, news articles, legal texts, news comments, product reviews, and so on. These texts are balanced, consisting of the same ratios of written and spoken data.

Vocabulary

vocab size 30,000

We used morpheme-based unit tokens for our vocabulary based on the Mecab-Ko morpheme analyzer.

Download Link

  • Tensorflow-v1 model (download)

  • PyTorch models on HuggingFace

from transformers import ElectraModel, ElectraTokenizer

model = ElectraModel.from_pretrained("snunlp/KR-ELECTRA-discriminator")
tokenizer = ElectraTokenizer.from_pretrained("snunlp/KR-ELECTRA-discriminator")

Finetuning

We used and slightly edited the finetuning codes from KoELECTRA, with additionally adjusted hyperparameters. You can download the codes and config files that we used for our model.

python3 run_seq_cls.py --task nsmc --config_file kr-electra.json
python3 run_seq_cls.py --task kornli --config_file kr-electra.json
python3 run_seq_cls.py --task paws --config_file kr-electra.json
python3 run_seq_cls.py --task question-pair --config_file kr-electra.json
python3 run_seq_cls.py --task korsts --config_file kr-electra.json
python3 run_seq_cls.py --task korsts --config_file kr-electra.json
python3 run_ner.py --task naver-ner --config_file kr-electra.json
python3 run_squad.py --task korquad --config_file kr-electra.json

Experimental Results

NSMC
(acc)
Naver NER
(F1)
PAWS
(acc)
KorNLI
(acc)
KorSTS
(spearman)
Question Pair
(acc)
KorQuaD (Dev)
(EM/F1)
Korean-Hate-Speech (Dev)
(F1)
KoBERT 89.59 87.92 81.25 79.62 81.59 94.85 51.75 / 79.15 66.21
XLM-Roberta-Base 89.03 86.65 82.80 80.23 78.45 93.80 64.70 / 88.94 64.06
HanBERT 90.06 87.70 82.95 80.32 82.73 94.72 78.74 / 92.02 68.32
KoELECTRA-Base 90.33 87.18 81.70 80.64 82.00 93.54 60.86 / 89.28 66.09
KoELECTRA-Base-v2 89.56 87.16 80.70 80.72 82.30 94.85 84.01 / 92.40 67.45
KoELECTRA-Base-v3 90.63 88.11 84.45 82.24 85.53 95.25 84.83 / 93.45 67.61
KR-ELECTRA (ours) 91.168 87.90 82.05 82.51 85.41 95.51 84.93 / 93.04 74.50

The baseline results are brought from KoELECTRA's.

Citation

@misc{kr-electra,
  author = {Lee, Sangah and Hyopil Shin},
  title = {KR-ELECTRA: a KoRean-based ELECTRA model},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/snunlp/KR-ELECTRA}}
}
Learning to Estimate Hidden Motions with Global Motion Aggregation

Learning to Estimate Hidden Motions with Global Motion Aggregation (GMA) This repository contains the source code for our paper: Learning to Estimate

Shihao Jiang (Zac) 221 Dec 18, 2022
[WWW 2022] Zero-Shot Stance Detection via Contrastive Learning

PT-HCL for Zero-Shot Stance Detection The code of this repository is constantly being updated... Please look forward to it! Introduction This reposito

Akuchi 12 Dec 21, 2022
A Library for Modelling Probabilistic Hierarchical Graphical Models in PyTorch

A Library for Modelling Probabilistic Hierarchical Graphical Models in PyTorch

Korbinian Pöppel 47 Nov 28, 2022
Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

DKPNet ICCV 2021 Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting Baseline of DKPNet is availa

19 Oct 14, 2022
Given a 2D triangle mesh, we could randomly generate cloud points that fill in the triangle mesh

generate_cloud_points Given a 2D triangle mesh, we could randomly generate cloud points that fill in the triangle mesh. Run python disp_mesh.py Or you

Peng Yu 2 Dec 24, 2021
An intelligent, flexible grammar of machine learning.

An english representation of machine learning. Modify what you want, let us handle the rest. Overview Nylon is a python library that lets you customiz

Palash Shah 79 Dec 02, 2022
Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)

Causality In Traffic Accident (Under Construction) Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020) Overview Data Prepa

Tackgeun 21 Nov 20, 2022
[NeurIPS 2021] Garment4D: Garment Reconstruction from Point Cloud Sequences

Garment4D [PDF] | [OpenReview] | [Project Page] Overview This is the codebase for our NeurIPS 2021 paper Garment4D: Garment Reconstruction from Point

Fangzhou Hong 112 Dec 23, 2022
TimeSHAP explains Recurrent Neural Network predictions.

TimeSHAP TimeSHAP is a model-agnostic, recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes even

Feedzai 90 Dec 18, 2022
Logistic Bandit experiments. Official code for the paper "Jointly Efficient and Optimal Algorithms for Logistic Bandits".

Code for the paper Jointly Efficient and Optimal Algorithms for Logistic Bandits, by Louis Faury, Marc Abeille, Clément Calauzènes and Kwang-Sun Jun.

Faury Louis 1 Jan 22, 2022
A community run, 5-day PyTorch Deep Learning Bootcamp

Deep Learning Winter School, November 2107. Tel Aviv Deep Learning Bootcamp : http://deep-ml.com. About Tel-Aviv Deep Learning Bootcamp is an intensiv

Shlomo Kashani. 1.3k Sep 04, 2021
Unsupervised Image-to-Image Translation

UNIT: UNsupervised Image-to-image Translation Networks Imaginaire Repository We have a reimplementation of the UNIT method that is more performant. It

Ming-Yu Liu 劉洺堉 1.9k Dec 26, 2022
SigOpt wrappers for scikit-learn methods

SigOpt + scikit-learn Interfacing This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together Getting Started In

SigOpt 73 Sep 30, 2022
Kohei's 5th place solution for xview3 challenge

xview3-kohei-solution Usage This repository assumes that the given data set is stored in the following locations: $ ls data/input/xview3/*.csv data/in

Kohei Ozaki 2 Jan 17, 2022
Imaging, analysis, and simulation software for radio interferometry

ehtim (eht-imaging) Python modules for simulating and manipulating VLBI data and producing images with regularized maximum likelihood methods. This ve

Andrew Chael 5.2k Dec 28, 2022
clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation

README clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation CVPR 2021 Authors: Suprosanna Shit and Johannes C. Paetzo

110 Dec 29, 2022
simple demo codes for Learning to Teach with Dynamic Loss Functions

Learning to Teach with Dynamic Loss Functions This repo contains the simple demo for the NeurIPS-18 paper: Learning to Teach with Dynamic Loss Functio

Lijun Wu 15 Dec 30, 2021
VIsually-Pivoted Audio and(N) Text

VIP-ANT: VIsually-Pivoted Audio and(N) Text Code for the paper Connecting the Dots between Audio and Text without Parallel Data through Visual Knowled

Yän.PnG 16 Nov 04, 2022
A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

SEAL ⠀⠀⠀ A PyTorch implementation of Semi-Supervised Graph Classification: A Hierarchical Graph Perspective (WWW 2019) Abstract Node classification an

Benedek Rozemberczki 202 Dec 27, 2022
Cancer-and-Tumor-Detection-Using-Inception-model - In this repo i am gonna show you how i did cancer/tumor detection in lungs using deep neural networks, specifically here the Inception model by google.

Cancer-and-Tumor-Detection-Using-Inception-model In this repo i am gonna show you how i did cancer/tumor detection in lungs using deep neural networks

Deepak Nandwani 1 Jan 01, 2022