QuALITY: Question Answering with Long Input Texts, Yes!

Last update: Jan 02, 2023

Related tags

Overview

QuALITY: Question Answering with Long Input Texts, Yes!

Authors: Richard Yuanzhe Pang,* Alicia Parrish,* Nitish Joshi,* Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, and Samuel R. Bowman (* = equal contribution)

Data link

Download QuALITY v0.9 (zip).

Paper preprint

You can read the paper here.

Data README

Here are the explanations to the fields in the jsonl file. Each json line corresponds to the set of validated questions, corresponding to one article, written by one writer.

article_id: String. A five-digit number uniquely identifying the article. In each split, there are exactly two lines containing the same article_id, because two writers wrote questions for the same article.
set_unique_id: String. The unique ID corresponding to the set of questions, which corresponds to the line of json. Each set of questions is written by the same writer.
batch_num: String. The batch number. Our data collection is split in two groups, and there are three batches in each group. [i][j] means the j-th batch in the i-th group. For example, 23 corresponds to the third batch in the second group.
writer_id: String. The anonymized ID of the writer who wrote this set of questions.
source: String. The source of the article.
title: String. The title of the article.
author: String. The author of the article.
topic: String. The topic of the article.
url: String. The URL of the original unprocessed source article.
license: String. The license information for the article.
article: String. The HTML of the article. A script that converts HTML to plain texts is provided.
questions: A list of dictionaries explained below. Each line of json has a different number of questions because some questions were removed following validation.

As discussed, the value of questions is a list of dictionaries. Each dictionary has the following fields.

question: The question.
options: A list of four answer options.
gold_label: The correct answer, defined by a majority vote of 3 or 5 annotators + the original writer's label. The number corresponds to the option number (1-indexed) in options.
writer_label: The label the writer provided. The number corresponds to the option number (1-indexed) in options.
validation: A list of dictionaries containing the untimed validation results. Each dictionary contains the following fields.
- untimed_annotator_id: The anonymized annotator IDs corresponding to the untimed validation results shown in untimed_answer.
- untimed_answer: The responses in the untimed validation. Each question in the training set is annotated by three workers in most cases, and each question in the dev/test sets is annotated by five cases in most cases (see paper for exceptions).
- untimed_eval1_answerability: The responses (represented numerically) to the first eval question in untimed validation. We asked the raters: “Is the question answerable and unambiguous?” The values correspond to the following choices:
  - 1: Yes, there is a single answer choice that is the most correct.
  - 2: No, two or more answer choices are equally correct.
  - 3: No, it is unclear what the question is asking, or the question or answer choices are unrelated to the passage.
- untimed_eval2_context: The responses (represented numerically) to the second eval question in untimed validation. We asked the raters: “How much of the passage/text is needed as context to answer this question correctly?” The values correspond to the following choices:
  - 1: Only a sentence or two of context.
  - 2: At least a long paragraph or two of context.
  - 3: At least a third of the passage for context.
  - 4: Most or all of the passage for context.
- untimed_eval3_distractor: The responses to the third eval question in untimed validation. We asked the raters: “Which of the options that you did not select was the best "distractor" item (i.e., an answer choice that you might be tempted to select if you hadn't read the text very closely)?” The numbers correspond to the option numbers (1-indexed).
speed_validation: A list of dictionaries containing the speed validation results. Each dictionary contains the following fields.
- speed_annotator_id: The anonymized annotator IDs corresponding to the speed annotation results shown in speed_answer.
- speed_answer: The responses in the speed validation. Each question is annotated by five workers.
difficult: A binary value. 1 means that less than 50% of the speed annotations answer the question correctly, so we include this question in the hard subset. Otherwise, the value is 0. In our evaluations, we report one accuracy figure for the entire dataset, and a second for the difficult=1 subset.

Validation criteria for the questions

More than 50% of annotators answer the question correctly in the untimed setting. That is, more than 50% of the untimed_answer annotations agree with gold_label (defined as the majority vote of validators' annotations together with the writer's provided label).
More than 50% of annotators think that the question is unambiguous and answerable. That is, more than 50% of the untimed_eval1_answerability annotations have 1's.

What are the `hard` questions?

More than 50% of annotators answer the question correctly in the untimed setting. That is, more than 50% of the untimed_answer annotations agree with gold_label.
More than 50% of annotators think that the question is unambiguous and answerable. That is, more than 50% of the untimed_eval1_answerability annotations have 1's.
More than 50% of annotators answer the question incorrectly in the speed validaiton setting. That is, more than 50% of the speed_answer annotations are incorrect.

Test set

The annotations for questions in the test set will not be released. We are currently working on a leaderboard. Stay tuned for an update by early January!

Code

The code for our baseline models will be released soon. Stay tuned for an update by early January!

Citation

@article{pang2021quality,
  title={{QuALITY}: Question Answering with Long Input Texts, Yes!},
  author={Pang, Richard Yuanzhe and Parrish, Alicia and Joshi, Nitish and Nangia, Nikita and Phang, Jason and Chen, Angelica and Padmakumar, Vishakh and Ma, Johnny and Thompson, Jana and He, He and Bowman, Samuel R.},
  journal={arXiv preprint arXiv:2112.08608},
  year={2021}
}

Contact

{yzpang, alicia.v.parrish}@nyu.edu

QuALITY: Question Answering with Long Input Texts, Yes!

Related tags

Overview

QuALITY: Question Answering with Long Input Texts, Yes!

Data link

Paper preprint

Data README

Validation criteria for the questions

What are the `hard` questions?

Test set

Code

Citation

Contact

Owner

ML² AT CILVR

An interpreter for RASP as described in the ICML 2021 paper "Thinking Like Transformers"

Our implementation used for the MICCAI 2021 FLARE Challenge titled 'Efficient Multi-Organ Segmentation Using SpatialConfiguartion-Net with Low GPU Memory Requirements'.

PyTorch EO aims to make Deep Learning for Earth Observation data easy and accessible to real-world cases and research alike.

A blender add-on that automatically re-aligns wrong axis objects.

Rocket-recycling with Reinforcement Learning

Pose estimation with MoveNet Lightning

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Byzantine-robust decentralized learning via self-centered clipping

This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

kullanışlı ve işinizi kolaylaştıracak bir araç

RetinaFace: Deep Face Detection Library in TensorFlow for Python

ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

Detect roadway lanes using Python OpenCV for project during the 5th semester at DHBW Stuttgart for lecture in digital image processing.

This is the repository for the paper "Have I done enough planning or should I plan more?"

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA results for single-image motion deblurring, image deraining, image denoising (synthetic and real data), and dual-pixel defocus deblurring.

Code, final versions, and information on the Sparkfun Graphical Datasheets

Open standard for machine learning interoperability

Testability-Aware Low Power Controller Design with Evolutionary Learning, ITC2021

MQBench Quantization Aware Training with PyTorch

QuALITY: Question Answering with Long Input Texts, Yes!

Related tags

Overview

QuALITY: Question Answering with Long Input Texts, Yes!

Data link

Paper preprint

Data README

Validation criteria for the questions

What are the hard questions?

Test set

Code

Citation

Contact

Owner

ML² AT CILVR

An interpreter for RASP as described in the ICML 2021 paper "Thinking Like Transformers"

Our implementation used for the MICCAI 2021 FLARE Challenge titled 'Efficient Multi-Organ Segmentation Using SpatialConfiguartion-Net with Low GPU Memory Requirements'.

PyTorch EO aims to make Deep Learning for Earth Observation data easy and accessible to real-world cases and research alike.

A blender add-on that automatically re-aligns wrong axis objects.

Rocket-recycling with Reinforcement Learning

Pose estimation with MoveNet Lightning

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Byzantine-robust decentralized learning via self-centered clipping

This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

kullanışlı ve işinizi kolaylaştıracak bir araç

RetinaFace: Deep Face Detection Library in TensorFlow for Python

ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

Detect roadway lanes using Python OpenCV for project during the 5th semester at DHBW Stuttgart for lecture in digital image processing.

This is the repository for the paper "Have I done enough planning or should I plan more?"

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA results for single-image motion deblurring, image deraining, image denoising (synthetic and real data), and dual-pixel defocus deblurring.

Code, final versions, and information on the Sparkfun Graphical Datasheets

Open standard for machine learning interoperability

Testability-Aware Low Power Controller Design with Evolutionary Learning, ITC2021

MQBench Quantization Aware Training with PyTorch

What are the `hard` questions?