For making Tagtog annotation into csv dataset

Last update: Dec 28, 2021

Overview

tagtog_relation_extraction

for making Tagtog annotation into csv dataset

How to Use

On Tagtog

1. Go to Project > Downloads
2. Download all documents, using the button below

On Local

1. Place folders and files according to the structure specified below:

tagtog_relation_extraction
├──main.py
├──util.py
├──.gitignore
├──README.md
├──requirements.txt
└──Your_download_file_Name
   ├──annotations-legend.json
   ├──ann.json
   |  └──master
   |     └──pool/
   ├──plain.html
   |  └──pool/
   ├──guidelines.md
   └──README.md

2. Install other required packages

tqdm==4.62.3
pandas==1.1.5
beautifulsoup4==4.10.0

$ pip install -r $ROOT/tagtog_relation_extraction/requirements.txt

3. Run

$ python main.py --path Your_download_file_Name

Result

1. Dataset file (dataset.csv)

csv file with rows in KLUE dataset format
example:

sentence: 가장 가능성이 높은 새 대안은 플랑크 상수를 통해 질량을 정의하는 방안이다.질량의 단위는 킬로그램 외에도 여러가지가 있는데, 그중 대표적인 단위가 바로 원자질량단위이다
sub_tag: {'word': '원자질량단위', 'start_idx': 85, 'end_idx': 90, 'type': 'POH'}
obj_tag: {'word': '플랑크 상수', 'start_idx': 17, 'end_idx': 22, 'type': 'POH'}
label: POH:no_relation'

2. File for checking answers (answer_check.csv)

csv file desgined for checking entity taggings and labels
example:

sentence: 가장 가능성이 높은 새 대안은 
   
    를 통해 질량을 정의하는 방안이다.질량의 단위는 킬로그램 외에도 여러가지가 있는데, 그중 대표적인 단위가 바로 
    
     이다	
sub_tag: POH
obj_tag: POH
label: POH:no_relation

Restrictions

Entity labels should follow the following form

SUBJ-{ENT_TYPE}-{RELATION_NAME}
OBJ-{ENT_TYPE}-{RELATION_NAME}

If this is not the case you might need some revision on the util.py file

For making Tagtog annotation into csv dataset

Related tags

Overview

tagtog_relation_extraction

How to Use

On Tagtog

On Local

Result

Restrictions

Owner

hyeong

4CAT: Capture and Analysis Toolkit

Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

MDAnalysis is a Python library to analyze molecular dynamics simulations.

OpenDrift is a software for modeling the trajectories and fate of objects or substances drifting in the ocean, or even in the atmosphere.

An Integrated Experimental Platform for time series data anomaly detection.

scikit-survival is a Python module for survival analysis built on top of scikit-learn.

An easy-to-use feature store

Statistical package in Python based on Pandas

Python Project on Pro Data Analysis Track

Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

A Numba-based two-point correlation function calculator using a grid decomposition

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.

Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

University Challenge 2021 With Python

Shot notebooks resuming the main functions of GeoPandas

A data analysis using python and pandas to showcase trends in school performance.

Methylation/modified base calling separated from basecalling.

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify.

Collections of pydantic models

Feature engineering and machine learning: together at last