APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Last update: Dec 06, 2022

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of the dataset are created by anonymous participants using an online crowdsourcing platform DeepNatural AI.

Sample Code :

Download

You can download benchmark set APEACH. APEACH/test.csv in this repository.

Dataset Description

APEACH : A hate-speech evaluation dataset generated in 2021, using generation method followd by APEACH paper.

Guidelines

APEACH-GUIDELINE

Topics

Lengths

Paper

https://arxiv.org/pdf/2202.12459.pdf

Experiment Code

Experiment Results

Name	Beep! Dev Dataset	Apeach (Ours)
SoongsilBERT-Base	0.8261	0.8424
SoongsilBERT-Small	0.8149	0.8228
KcBERT-base	0.8088	0.8086
KcBERT-large	0.8295	0.8116
DistillKoBERT	0.7570	0.7715
KoELECTRA-V3	0.7920	0.8101
KoBERT	0.8030	0.7885

We also share BEST model of our dataset which we trained in this experiment as checkpoint, demo webite and api.

Citation

@article{yang2022apeach,
  title={APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets},
  author={Yang, Kichang and Jang, Wonjun and Cho, Won Ik},
  journal={arXiv preprint arXiv:2202.12459},
  year={2022}
}

Contributors

The main contributors of the work ( * : equal contribution) :

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

Download

Dataset Description

Guidelines

Topics

Lengths

Paper

Experiment Code

Experiment Results

Citation

Contributors

License

Owner

Kevin-Yang

This repo is to provide a list of literature regarding Deep Learning on Graphs for NLP

TruthfulQA: Measuring How Models Imitate Human Falsehoods

👄 The most accurate natural language detection library for Python, suitable for long and short text alike

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

Rich Prosody Diversity Modelling with Phone-level Mixture Density Network

nlp基础任务

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Russian GPT3 models.

A telegram bot to translate 100+ Languages

Phomber is infomation grathering tool that reverse search phone numbers and get their details, written in python3.

Simple translation demo showcasing our headliner package.

Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las Tecnologías del Lenguaje" (Plan-TL).

Chinese Grammatical Error Diagnosis

Blazing fast language detection using fastText model

Write Alphabet, Words and Sentences with your eyes.

Graph Coloring - Weighted Vertex Coloring Problem

AI and Machine Learning workflows on Anthos Bare Metal.

A simple word search made in python

"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021