QuALITY: Question Answering with Long Input Texts, Yes!

Related tags

Deep Learningquality
Overview

QuALITY: Question Answering with Long Input Texts, Yes!

Authors: Richard Yuanzhe Pang,* Alicia Parrish,* Nitish Joshi,* Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, and Samuel R. Bowman (* = equal contribution)

Data link

Download QuALITY v0.9 (zip).

Paper preprint

You can read the paper here.

Data README

Here are the explanations to the fields in the jsonl file. Each json line corresponds to the set of validated questions, corresponding to one article, written by one writer.

  • article_id: String. A five-digit number uniquely identifying the article. In each split, there are exactly two lines containing the same article_id, because two writers wrote questions for the same article.
  • set_unique_id: String. The unique ID corresponding to the set of questions, which corresponds to the line of json. Each set of questions is written by the same writer.
  • batch_num: String. The batch number. Our data collection is split in two groups, and there are three batches in each group. [i][j] means the j-th batch in the i-th group. For example, 23 corresponds to the third batch in the second group.
  • writer_id: String. The anonymized ID of the writer who wrote this set of questions.
  • source: String. The source of the article.
  • title: String. The title of the article.
  • author: String. The author of the article.
  • topic: String. The topic of the article.
  • url: String. The URL of the original unprocessed source article.
  • license: String. The license information for the article.
  • article: String. The HTML of the article. A script that converts HTML to plain texts is provided.
  • questions: A list of dictionaries explained below. Each line of json has a different number of questions because some questions were removed following validation.

As discussed, the value of questions is a list of dictionaries. Each dictionary has the following fields.

  • question: The question.
  • options: A list of four answer options.
  • gold_label: The correct answer, defined by a majority vote of 3 or 5 annotators + the original writer's label. The number corresponds to the option number (1-indexed) in options.
  • writer_label: The label the writer provided. The number corresponds to the option number (1-indexed) in options.
  • validation: A list of dictionaries containing the untimed validation results. Each dictionary contains the following fields.
    • untimed_annotator_id: The anonymized annotator IDs corresponding to the untimed validation results shown in untimed_answer.
    • untimed_answer: The responses in the untimed validation. Each question in the training set is annotated by three workers in most cases, and each question in the dev/test sets is annotated by five cases in most cases (see paper for exceptions).
    • untimed_eval1_answerability: The responses (represented numerically) to the first eval question in untimed validation. We asked the raters: “Is the question answerable and unambiguous?” The values correspond to the following choices:
      • 1: Yes, there is a single answer choice that is the most correct.
      • 2: No, two or more answer choices are equally correct.
      • 3: No, it is unclear what the question is asking, or the question or answer choices are unrelated to the passage.
    • untimed_eval2_context: The responses (represented numerically) to the second eval question in untimed validation. We asked the raters: “How much of the passage/text is needed as context to answer this question correctly?” The values correspond to the following choices:
      • 1: Only a sentence or two of context.
      • 2: At least a long paragraph or two of context.
      • 3: At least a third of the passage for context.
      • 4: Most or all of the passage for context.
    • untimed_eval3_distractor: The responses to the third eval question in untimed validation. We asked the raters: “Which of the options that you did not select was the best "distractor" item (i.e., an answer choice that you might be tempted to select if you hadn't read the text very closely)?” The numbers correspond to the option numbers (1-indexed).
  • speed_validation: A list of dictionaries containing the speed validation results. Each dictionary contains the following fields.
    • speed_annotator_id: The anonymized annotator IDs corresponding to the speed annotation results shown in speed_answer.
    • speed_answer: The responses in the speed validation. Each question is annotated by five workers.
  • difficult: A binary value. 1 means that less than 50% of the speed annotations answer the question correctly, so we include this question in the hard subset. Otherwise, the value is 0. In our evaluations, we report one accuracy figure for the entire dataset, and a second for the difficult=1 subset.

Validation criteria for the questions

  • More than 50% of annotators answer the question correctly in the untimed setting. That is, more than 50% of the untimed_answer annotations agree with gold_label (defined as the majority vote of validators' annotations together with the writer's provided label).
  • More than 50% of annotators think that the question is unambiguous and answerable. That is, more than 50% of the untimed_eval1_answerability annotations have 1's.

What are the hard questions?

  • More than 50% of annotators answer the question correctly in the untimed setting. That is, more than 50% of the untimed_answer annotations agree with gold_label.
  • More than 50% of annotators think that the question is unambiguous and answerable. That is, more than 50% of the untimed_eval1_answerability annotations have 1's.
  • More than 50% of annotators answer the question incorrectly in the speed validaiton setting. That is, more than 50% of the speed_answer annotations are incorrect.

Test set

The annotations for questions in the test set will not be released. We are currently working on a leaderboard. Stay tuned for an update by early January!

Code

The code for our baseline models will be released soon. Stay tuned for an update by early January!

Citation

@article{pang2021quality,
  title={{QuALITY}: Question Answering with Long Input Texts, Yes!},
  author={Pang, Richard Yuanzhe and Parrish, Alicia and Joshi, Nitish and Nangia, Nikita and Phang, Jason and Chen, Angelica and Padmakumar, Vishakh and Ma, Johnny and Thompson, Jana and He, He and Bowman, Samuel R.},
  journal={arXiv preprint arXiv:2112.08608},
  year={2021}
}

Contact

{yzpang, alicia.v.parrish}@nyu.edu

Owner
ML² AT CILVR
The Machine Learning for Language Group at NYU CILVR
ML² AT CILVR
Kaggle: Cell Instance Segmentation

Kaggle: Cell Instance Segmentation The goal of this challenge is to detect cells in microscope images. with simple view on how many cels have been ann

Jirka Borovec 9 Aug 12, 2022
torchbearer: A model fitting library for PyTorch

Note: We're moving to PyTorch Lightning! Read about the move here. From the end of February, torchbearer will no longer be actively maintained. We'll

632 Dec 13, 2022
Code and real data for the paper "Counterfactual Temporal Point Processes", available at arXiv.

counterfactual-tpp This is a repository containing code and real data for the paper Counterfactual Temporal Point Processes. Pre-requisites This code

Networks Learning 11 Dec 09, 2022
Adaout is a practical and flexible regularization method with high generalization and interpretability

Adaout Adaout is a practical and flexible regularization method with high generalization and interpretability. Requirements python 3.6 (Anaconda versi

lambett 1 Feb 09, 2022
In this project, we develop a face recognize platform based on MTCNN object-detection netcwork and FaceNet self-supervised network.

模式识别大作业——人脸检测与识别平台 本项目是一个简易的人脸检测识别平台,提供了人脸信息录入和人脸识别的功能。前端采用 html+css+js,后端采用 pytorch,

Xuhua Huang 5 Aug 02, 2022
ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

ManimML ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

259 Jan 04, 2023
Automatic library of congress classification, using word embeddings from book titles and synopses.

Automatic Library of Congress Classification The Library of Congress Classification (LCC) is a comprehensive classification system that was first deve

Ahmad Pourihosseini 3 Oct 01, 2022
LETR: Line Segment Detection Using Transformers without Edges

LETR: Line Segment Detection Using Transformers without Edges Introduction This repository contains the official code and pretrained models for Line S

mlpc-ucsd 157 Jan 06, 2023
An example of semantic segmentation using tensorflow in eager execution.

Semantic segmentation using Tensorflow eager execution Requirement Python 2.7+ Tensorflow-gpu OpenCv H5py Scikit-learn Numpy Imgaug Train with eager e

Iñigo Alonso Ruiz 25 Sep 29, 2022
Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu,

GEMS Lab: Graph Exploration & Mining at Scale, University of Michigan 70 Dec 18, 2022
Benchmark for the generalization of 3D machine learning models across different remeshing/samplings of a surface.

Discretization Robust Correspondence Benchmark One challenge of machine learning on 3D surfaces is that there are many different representations/sampl

Nicholas Sharp 10 Sep 30, 2022
Simple and understandable swin-transformer OCR project

swin-transformer-ocr ocr with swin-transformer Overview Simple and understandable swin-transformer OCR project. The model in this repository heavily r

Ha YongWook 67 Dec 31, 2022
a morph transfer UGATIT for image translation.

Morph-UGATIT a morph transfer UGATIT for image translation. Introduction 中文技术文档 This is Pytorch implementation of UGATIT, paper "U-GAT-IT: Unsupervise

55 Nov 14, 2022
Simple Python project using Opencv and datetime package to recognise faces and log attendance data in a csv file.

Attendance-System-based-on-Facial-recognition-Attendance-data-stored-in-csv-file- Simple Python project using Opencv and datetime package to recognise

3 Aug 09, 2022
Source code, data, and evaluation details for “Cross-Lingual Citations in English Papers: A Large-Scale Analysis of Prevalence, Formation, and Ramifications”

Analysis of cross-lingual citations in English papers Contents initial_analysis Source code, data, and evaluation details as published at ICADL2020 ci

Tarek Saier 1 Oct 27, 2022
Official code for the paper "Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks".

Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks This repository contains the official code for the

Linus Ericsson 11 Dec 16, 2022
Code for Environment Dynamics Decomposition (ED2).

ED2 Code for Environment Dynamics Decomposition (ED2). Installation Follow the installation in MBPO and Dreamer. Usage First follow the SD2 method for

0 Aug 10, 2021
Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Diverse Object-Scene Compositions For Zero-Shot Action Recognition This repository contains the source code for the use of object-scene compositions f

7 Sep 21, 2022
Deep Learning Tutorial for Kaggle Ultrasound Nerve Segmentation competition, using Keras

Deep Learning Tutorial for Kaggle Ultrasound Nerve Segmentation competition, using Keras This tutorial shows how to use Keras library to build deep ne

Marko Jocić 922 Dec 19, 2022
A solution to the 2D Ising model of ferromagnetism, implemented using the Metropolis algorithm

Solving the Ising model on a 2D lattice using the Metropolis Algorithm Introduction The Ising model is a simplified model of ferromagnetism, the pheno

Rohit Prabhu 5 Nov 13, 2022