A text augmentation tool for named entity recognition.

Last update: Oct 11, 2022

Overview

neraug

This python library helps you with augmenting text data for named entity recognition.

Augmentation Example

Reference from An Analysis of Simple Data Augmentation for Named Entity Recognition

Installation

To install the library:

pip install neraug

Usage

One of the example algorithms: DictionaryReplacement:

>>> from neraug.augmentator import DictionaryReplacement
>>> from neraug.scheme import IOBES

>>> ne_dic = {'Tokyo Big Sight': 'LOC'}
>>> augmentator = DictionaryReplacement(ne_dic, str.split, IOBES)
>>> x = ['I', 'went', 'to', 'Tokyo']
>>> y = ['O', 'O', 'O', 'S-LOC']
>>> x_augs, y_augs = augmentator.augment(x, y, n=1)   
>>> x_augs
[['I', 'went', 'to', 'Tokyo', 'Big', 'Sight']]
>>> y_augs
[['O', 'O', 'O', 'B-LOC', 'I-LOC', 'E-LOC']]

The library supports the following algorithms:

DictionaryReplacement
LabelWiseTokenReplacement
MentionReplacement
ShuffleWithinSegment

and supports the following scheme:

IOB2
IOBES
BILOU

Reference

Appreciate for the following research:

An Analysis of Simple Data Augmentation for Named Entity Recognition

Citation

@misc{neraug,
  title={neraug: A data augmentation tool for named entity recognition},
  author={Hiroki Nakayama},
  url={https://github.com/Hironsan/neraug},
  year={2021}
}

You might also like...

Pytorch-Named-Entity-Recognition-with-BERT

BERT NER Use google BERT to do CoNLL-2003 NER ! Train model using Python and Inference using C++ ALBERT-TF2.0 BERT-NER-TENSORFLOW-2.0 BERT-SQuAD Requi

1.1k Dec 25, 2022

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

0 Feb 13, 2022

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

For better performance, you can try NLPGNN, see NLPGNN for more details. BERT-NER Version 2 Use Google's BERT for named entity recognition （CoNLL-2003

1.2k Dec 26, 2022

Named Entity Recognition API used by TEI Publisher

TEI Publisher Named Entity Recognition API This repository contains the API used by TEI Publisher's web-annotation editor to detect entities in the in

14 Nov 15, 2022

Nested Named Entity Recognition

Nested Named Entity Recognition Training Dataset: CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark url: https://tianchi.aliyun.

8 Dec 25, 2022

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

9 Nov 7, 2022

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Named Entity Recognition API with spaCy and GiNZA I wrote a blog post about this

3 Feb 27, 2022

Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

README Code for Two-stage Identifier: "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022. For details of the model a

45 Nov 29, 2022

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu

80 Jan 3, 2023

Releases(v0.1.1)

v0.1.1(Jul 22, 2021)

Remove tokenizer from MentionReplacement
Source code(tar.gz)
Source code(zip)
v0.1.0(Jul 22, 2021)

Source code(tar.gz)
Source code(zip)

A text augmentation tool for named entity recognition.

Related tags

Overview

neraug

Augmentation Example

Installation

Usage

Reference

Citation

You might also like...

Pytorch-Named-Entity-Recognition-with-BERT

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

Named Entity Recognition API used by TEI Publisher

Nested Named Entity Recognition

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

Releases(v0.1.1)

v0.1.1(Jul 22, 2021)

v0.1.0(Jul 22, 2021)

Owner

Hiroki Nakayama

A unified tokenization tool for Images, Chinese and English.

Chatbot for the Chatango messaging platform

Official implementation of Meta-StyleSpeech and StyleSpeech

Switch spaces for knowledge graph embeddings

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

用Resnet101+GPT搭建一个玩王者荣耀的AI

History Aware Multimodal Transformer for Vision-and-Language Navigation

Write Alphabet, Words and Sentences with your eyes.

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Ukrainian TTS (text-to-speech) using Coqui TTS

Yodatranslator is a simple translator English to Yoda-language

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

Word Bot for JKLM Bomb Party

This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".

Repository of the Code to Chatbots, developed in Python

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Blue Brain text mining toolbox for semantic search and structured information extraction

Practical Machine Learning with Python

Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

Simple Speech to Text, Text to Speech