IndoNLI: A Natural Language Inference Dataset for Indonesian

This is a repository for data and code accompanying our EMNLP 2021 paper "IndoNLI: A Natural Language Inference Dataset for Indonesian". The datasets used for our experiments can be found under the data directory:

indonli: human-annotated NLI data, split into train, val, and test (test_lay and test_expert)

diagnostic: subset of examples from test_expert that are annotated with linguistic and logical phenomena
translate_train.tar.gz: MNLI dataset translated to Indonesian (train and dev)
translate_train_small.tar.gz: sampled of translate_train used for the translate_train_small experiment.

The experiment code can be found under experiment directory, please check the related README file.

License

We use premises taken from the Indonesian Wikipedia, news, and Web articles.

Wikipedia is licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA) and the GNU Free Documentation License (GFDL).

For the news genre, we use premise text from Indonesian PUD and GSD treebanks provided by the Universal Dependencies 2.5 (Zeman et al., 2019) and IndoSum (Kurniawan and Louvan, 2018). Indonesian PUD and GSD treebanks are licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA) and Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-SA). IndoSum is licensed under Apache License, Version 2.0.

Citation

If you use our corpus in your work, please consider citing our paper:

@inproceedings{indonli,
    title = "IndoNLI: A Natural Language Inference Dataset for Indonesian",
    author = "Mahendra, Rahmad and Aji, Alham Fikri and Louvan, Samuel and Rahman, Fahrurrozi and Vania, Clara",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    publisher = "Association for Computational Linguistics",
}

IndoNLI: A Natural Language Inference Dataset for Indonesian

Related tags

Overview

IndoNLI: A Natural Language Inference Dataset for Indonesian

License

Citation

Owner

Uni-Fold: Training your own deep protein-folding models

Generate images from texts. In Russian

Lane follower: Lane-detector (OpenCV) + Object-detector (YOLO5) + CAN-bus

CaFM-pytorch ICCV ACCEPT Introduction of dataset VSD4K

Code release of paper Improving neural implicit surfaces geometry with patch warping

VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

Multi-Person Extreme Motion Prediction

Ready-to-use code and tutorial notebooks to boost your way into few-shot image classification.

Stroke-predictions-ml-model - Machine learning model to predict individuals chances of having a stroke

Simple and Robust Loss Design for Multi-Label Learning with Missing Labels

The official PyTorch code implementation of "Human Trajectory Prediction via Counterfactual Analysis" in ICCV 2021.

The code for paper Efficiently Solve the Max-cut Problem via a Quantum Qubit Rotation Algorithm

CVPR 2021: "The Spatially-Correlative Loss for Various Image Translation Tasks"

A Japanese Medical Information Extraction Toolkit

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

Online-compatible Unsupervised Non-resonant Anomaly Detection Repository

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

Sandbox for training deep learning networks

Mscp jamf - Build compliance in jamf

A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost