INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing

Existing studies on semantic parsing focus primarily on mapping a natural-language utterance to a corresponding logical form in one turn. However, because natural language can contain a great deal of ambiguity and variability, this is a difficult challenge. In this work, we investigate an interactive semantic parsing framework that explains the predicted logical form step by step in natural language and enables the user to make corrections through natural-language feedback for individual steps. We focus on question answering over knowledge bases (KBQA) as an instantiation of our framework, aiming to increase the transparency of the parsing process and help the user appropriately trust the final answer. To do so, we construct INSPIRED, a crowdsourced dialogue dataset derived from the ComplexWebQuestions dataset.

This repository will contain the dataset and code for our paper Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction.

Data

Dataset Download

The dataset can be downloaded under this path: ./data/dataset.jsonl

Data Structure

In the dataset file, each line is a dictionary with several keys:

{
    "id": "ID number",
    "cwq_question": "Original complex question in CWQ dataset",
    "rephrased_question": "Rephrased complex question by workers",
    "rephrased_question_label": " 'Replacement' or 'Alternative' ",
    "question": "If rephrased_question_label is marked as 'Replacement', set the value the same as rephrased_question; Otherwise, set it the same as cwq_question",
    "final_answer": "Final answer for the complex question",
    "gold_parse": "Gold sparql query for complex question",
    "preprocessed_gold_parse": "Preprocessed gold parse with entities and prefix replaced",
    "predicted_parse": "Predicted sparql query by initial semantic parser",
    "gold_sub_lfs": "A list of gold sub-logical forms after decomposition",
    "pred_sub_lfs": "A list of predicted sub-logical forms after decomposition",
    "gold_sub_qs": [
        {
          "sub_id": "ID of sub questions",
          "sub_question": "Rephrased sub question",
          "temp_sub_question": "Templated sub question for gold sub-logical form",
          "answer": "Answer for each sub question",
        }, "..."], 
    "pred_sub_qs": [
        {
          "sub_id": "ID of sub questions",
          "sub_question": "Rephrased sub question",
          "temp_sub_question": "Templated sub question for predicted sub-logical form",
          "answer": "Answer for each sub question",
        }, "..."], 
    "feedback": "A list of human feedback"
    
}

INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing

Related tags

Overview

INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing

Data

Dataset Download

Data Structure

Owner

A list of awesome PyTorch scholarship articles, guides, blogs, courses and other resources.

Volumetric parameterization of the placenta to a flattened template

Unofficial implementation of the ImageNet, CIFAR 10 and SVHN Augmentation Policies learned by AutoAugment using pillow

The official implementation of EIGNN: Efficient Infinite-Depth Graph Neural Networks (NeurIPS 2021)

🚗 INGI Dakar 2K21 - Be the first one on the finish line ! 🚗

we propose a novel deep network, named feature aggregation and refinement network (FARNet), for the automatic detection of anatomical landmarks.

This code reproduces the results of the paper, "Measuring Data Leakage in Machine-Learning Models with Fisher Information"

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Official implementation of the article "Unsupervised JPEG Domain Adaptation For Practical Digital Forensics"

This is a collection of all challenges in HKCERT CTF 2021

This repository is maintained for the scientific paper tittled " Study of keyword extraction techniques for Electric Double Layer Capacitor domain using text similarity indexes: An experimental analysis "

Affine / perspective transformation in Pose Estimation with Tensorflow 2

A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets

Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective

Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models

Fairness Metrics: All you need to know

Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.

Code for You Only Cut Once: Boosting Data Augmentation with a Single Cut

A PyTorch implementation of "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019).

BaseCls BaseCls 是一个基于 MegEngine 的预训练模型库，帮助大家挑选或训练出更适合自己科研或者业务的模型结构