The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.

Last update: Dec 01, 2022

Related tags

Deep Learning interscript

Overview

Interscript

The Interscript dataset contains interactive user feedback on a T5-11B model generated scripts.

Dataset

data.json contains the data in an easy to read JSON format. data.jsonl contains the data in a JSONL format. The file contains 8466 samples, one sample per line. Every sample is a JSON object with the following fields:

 {
        "input_script": "push chair in -> pull chair in; pull chair in -> push chair against wall; push chair against wall -> straighten chair legs; straighten chair legs -> Push all chairs in; line up the chairs -> push chair in",
        "input_feedback": "One would not pull chair in if they had initially pushed it in.",
        "output_script": "push chair against wall -> straighten chair legs;straighten chair legs -> Push all chairs in;line up the chairs -> push chair in;push chair in -> push chair against wall",
        "metadata": {
            "id": "301KG0KX9BKTC0HB7Z9SV1Y5HAFH2Y.2_implicit.gp",
            "goal": "push all chairs in",
            "is_distractor": false,
            "feedback_type": "implicit.gp",
            "edit": "Remove node 'pull chair in'",
            "input_script_formatted": [
                "1. line up the chairs",
                "2. push chair in",
                "3. pull chair in",
                "4. push chair against wall",
                "5. straighten chair legs",
                "6. Push all chairs in"
            ],
            "output_script_formatted": [
                "1. line up the chairs",
                "2. push chair in",
                "3. push chair against wall",
                "4. straighten chair legs",
                "5. Push all chairs in"
            ]
        }
    }

The description of the fields is as follows:

input_script: Model generated script $y_{bad}$.
input_feedback: User feedback on the input script $f$.
output_script: Fixed output script $y_{good}$.

Metadata contains additional information about the sample. Some important fields are:

id: Unique identifier of the sample.
goal: Goal of the script.
is_distractor: Whether the feedback is a distractor (please see Section 4 for more details).
feedback_type: Type of feedback (please see Section 4 "Annotation" for more details).
edit: The input_feedback presented as an edit operation on the input script, that is, the edit operation that transforms the input script into the output script.
input_script_formatted: The input script presented as a list of sentences.
output_script_formatted: The output script presented as a list of sentences.

Data collection process

We use Amazon Mechanical Turk to collect feedback on erroneous scripts from users.
An overview of the process is captured in the following figure:

Amazon Mechanical Turk Template

turk_template.html contains the template for Amazon Mechanical Turk HITs.

The InterScript dataset contains interactive user feedback on scripts generated by a T5-XXL model.

Related tags

Overview

Interscript

Dataset

Data collection process

Amazon Mechanical Turk Template

Owner

AI2

code for our ECCV 2020 paper "A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation"

Code accompanying the paper "Wasserstein GAN"

Tutorial on active learning with the Nvidia Transfer Learning Toolkit (TLT).

PyTorch implementation of Barlow Twins.

Hso-groupie - A pwnable challenge in Real World CTF 4th

This repository provides a PyTorch implementation and model weights for HCSC (Hierarchical Contrastive Selective Coding)

A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.

Code base for NeurIPS 2021 publication titled Kernel Functional Optimisation (KFO)

Auxiliary data to the CHIIR paper Searching to Learn with Instructional Scaffolding

Python code to generate art with Generative Adversarial Network

a simple, efficient, and intuitive text editor

Assessing syntactic abilities of BERT

A standard framework for modelling Deep Learning Models for tabular data

Auto-Encoding Score Distribution Regression for Action Quality Assessment

4th place solution for the SIGIR 2021 challenge.

Lyapunov-guided Deep Reinforcement Learning for Stable Online Computation Offloading in Mobile-Edge Computing Networks

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Explanatory Learning: Beyond Empiricism in Neural Networks

Shared Attention for Multi-label Zero-shot Learning

Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression