Amazon Multilingual Counterfactual Dataset (AMCD)

Last update: Sep 20, 2022

Overview

Amazon Multilingual Counterfactual Dataset (AMCD)

This repository contains a dataset described in the paper:

I Wish I Would Have Loved This One, But I Didn’t – A Multilingual Dataset for Counterfactual Detection in Product Reviews. James O’Neill, Polina Rozenshtein, Ryuichi Kiryo, Motoko Kubota, Danushka Bollegala. EMNLP'21. arxiv version

The dataset contains sentences from Amazon customer reviews (sampled from Amazon product review dataset) annotated for counterfactual detection (CFD) binary classification. Counterfactual statements describe events that did not or cannot take place. Counterfactual statements may be identified as statements of the form – If p was true, then q would be true (i.e. assertions whose antecedent (p) and consequent (q) are known or assumed to be false).

The key features of this dataset are:

The dataset is multilingual and contains sentences in English, German, and Japanese.
The labeling was done by professional linguists and high quality was ensured.
The dataset is supplemented with the annotation guidelines and definitions, which were worked out by professional linguists. We also provide the clue word lists, which are typical for counterfactual sentences and were used for initial data filtering. The clue word lists were also compiled by professional linguists.

Please see paper for the data statistics, detailed description of data collection and annotation.

For the dataset format please see README.txt.

Cite

If you use this dataset in your research, please cite the paper.

License Summary

The documentation is made available under the Creative Commons Attribution-ShareAlike 4.0 International License. See the LICENSE file.

Amazon Multilingual Counterfactual Dataset (AMCD)

Related tags

Overview

Amazon Multilingual Counterfactual Dataset (AMCD)

Cite

License Summary

Owner

The ibet-Prime security token management system for ibet network.

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

Text preprocessing, representation and visualization from zero to hero.

Codes to pre-train Japanese T5 models

Random-Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

A BERT-based reverse dictionary of Korean proverbs

RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts).

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Unsupervised text tokenizer for Neural Network-based text generation.

NSFW A chatbot based on GPT2-chitchat

Machine translation models released by the Gourmet project

Simple bots or Simbots is a library designed to create simple bots using the power of python. This library utilises Intent, Entity, Relation and Context model to create bots .

A telegram bot to translate 100+ Languages

This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

News-Articles-and-Essays - NLP (Topic Modeling and Clustering)

DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Amazon Multilingual Counterfactual Dataset (AMCD)

Related tags

Overview

Amazon Multilingual Counterfactual Dataset (AMCD)

Cite

License Summary

Owner

The ibet-Prime security token management system for ibet network.

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

Text preprocessing, representation and visualization from zero to hero.

Codes to pre-train Japanese T5 models

Random-Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

A BERT-based reverse dictionary of Korean proverbs

RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts).

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Unsupervised text tokenizer for Neural Network-based text generation.

**NSFW** A chatbot based on GPT2-chitchat

Machine translation models released by the Gourmet project

Simple bots or Simbots is a library designed to create simple bots using the power of python. This library utilises Intent, Entity, Relation and Context model to create bots .

A telegram bot to translate 100+ Languages

This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

News-Articles-and-Essays - NLP (Topic Modeling and Clustering)

DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

NSFW A chatbot based on GPT2-chitchat