This repo in the implementation of EMNLP'21 paper "SPARQLing Database Queries from Intermediate Question Decompositions" by Irina Saparina, Anton Osokin

Overview

SPARQLing Database Queries from Intermediate Question Decompositions

This repo is the implementation of the following paper:

SPARQLing Database Queries from Intermediate Question Decompositions
Irina Saparina and Anton Osokin
To appear in proceedings of EMNLP'21

License

This software is released under the MIT license, which means that you can use the code in any way you want.

Dependencies

Conda env with pytorch 1.9

Create conda env with pytorch 1.9 and many other packages upgraded: conda_env_with_pytorch1.9.yaml:

conda env create -n env-torch1.9 -f conda_env_with_pytorch1.9.yaml
conda activate env-torch1.9

Download some nltk resourses, Bert and GraPPa:

python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt')"
python -c "from transformers import AutoModel; AutoModel.from_pretrained('bert-large-uncased-whole-word-masking'); AutoModel.from_pretrained('Salesforce/grappa_large_jnt')"

mkdir -p third_party && \
cd third_party && \
curl https://nlp.stanford.edu/software/stanford-corenlp-full-2018-10-05.zip | jar xv

Data

We currently provide both Spider and Break inside our repos. Note that datasets differ from original ones as we fixed some annotation errors. Download databases:

bash ./utils/wget_gdrive.sh spider_temp.zip 11icoH_EA-NYb0OrPTdehRWm_d7-DIzWX
unzip spider_temp.zip -d spider_temp
cp -r spider_temp/spider/database ./data/spider
rm -rf spider_temp/
python ./qdmr2sparql/fix_databases.py --spider_path ./data/spider

To reproduce our annotation procedure see qdmr2sparql/README.md.

For testing qdmr2sparql translator run qdmr2sparql/test_qdmr2sparql.py

Experiments

Every experiment has its own config file in text2qdmr/configs/experiments. The pipeline of working with any model version or dataset is:

python run_text2qdmr.py preprocess experiment_config_file  # preprocess the data
python run_text2qdmr.py train experiment_config_file       # train a model
python run_text2qdmr.py eval experiment_config_file        # evaluate the results

# multiple GPUs on one machine:
export NGPUS=4 # set $NGPUS manually
python -m torch.distributed.launch --nproc_per_node=$NGPUS --use_env --master_port `./utils/get_free_port.sh`  run_text2qdmr.py train experiment_config_file

Note that preprocessing and evaluation use execution and take some time. To speed up the evaluation, you can install Virtuoso server (see qdmr2sparql/README_Virtuoso.md).

Checkpoints and samples

The dev and test examples of model output are model_samples/.

Checkpoints of our best models:

Model name Dev Test Link
grappa-aug 80.4 62.0 https://www.dropbox.com/s/t9z1uwvohuakig8/grappa-aug_model_checkpoint-00072000?dl=0
grappa-full_break 74.6 62.6 https://www.dropbox.com/s/bf6vyhtep4knmm7/full-break-grappa_model_checkpoint-00075000?dl=0

Acknowledgements

Text2qdmr module is based on RAT-SQL code, the implementation of ACL'20 paper "RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers" by Wang et al.

Spider dataset was proposed by Yi et al. in EMNLP'18 paper "Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task".

Break dataset was proposed by Wolfson et al. in TACL paper "Break It Down: A Question Understanding Benchmark".

[CVPR 2021] Few-shot 3D Point Cloud Semantic Segmentation

Few-shot 3D Point Cloud Semantic Segmentation Created by Na Zhao from National University of Singapore Introduction This repository contains the PyTor

117 Dec 27, 2022
RANZCR-CLiP 7th Place Solution

RANZCR-CLiP 7th Place Solution This repository is WIP. (18 Mar 2021) Installation git clone https://github.com/analokmaus/kaggle-ranzcr-clip-public.gi

Hiroshechka Y 21 Oct 22, 2022
Hypercomplex Neural Networks with PyTorch

HyperNets Hypercomplex Neural Networks with PyTorch: this repository would be a container for hypercomplex neural network modules to facilitate resear

Eleonora Grassucci 21 Dec 27, 2022
Code for paper 'Hand-Object Contact Consistency Reasoning for Human Grasps Generation' at ICCV 2021

GraspTTA Hand-Object Contact Consistency Reasoning for Human Grasps Generation (ICCV 2021). Project Page with Videos Demo Quick Results Visualization

Hanwen Jiang 47 Dec 09, 2022
A tensorflow implementation of an HMM layer

tensorflow_hmm Tensorflow and numpy implementations of the HMM viterbi and forward/backward algorithms. See Keras example for an example of how to use

Zach Dwiel 283 Oct 19, 2022
Construct a neural network frame by Numpy

本项目的CSDN博客链接:https://blog.csdn.net/weixin_41578567/article/details/111482022 1. 概览 本项目主要用于神经网络的学习,通过基于numpy的实现,了解神经网络底层前向传播、反向传播以及各类优化器的原理。 该项目目前已实现的功

24 Jan 22, 2022
BankNote-Net: Open dataset and encoder model for assistive currency recognition

BankNote-Net: Open Dataset for Assistive Currency Recognition Millions of people around the world have low or no vision. Assistive software applicatio

Microsoft 13 Oct 28, 2022
WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose

WHENet: Real-time Fine-Grained Estimation for Wide Range Head Pose Yijun Zhou and James Gregson - BMVC2020 Abstract: We present an end-to-end head-pos

368 Dec 26, 2022
GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled

Guidedog Authors: Kyuhee Jo, Steven Gunarso, Jacky Wang, Raghav Sharma GuideDog is an AI/ML-based mobile app designed to assist the lives of the visua

Kyuhee Jo 5 Nov 24, 2021
MMRazor: a model compression toolkit for model slimming and AutoML

Documentation: https://mmrazor.readthedocs.io/ English | 简体中文 Introduction MMRazor is a model compression toolkit for model slimming and AutoML, which

OpenMMLab 899 Jan 02, 2023
Jingju baseline - A baseline model of our project of Beijing opera script generation

Jingju Baseline It is a baseline of our project about Beijing opera script gener

midon 1 Jan 14, 2022
[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers This is the official PyTorch implementation and models for UP-DETR paper: @a

dddzg 430 Dec 23, 2022
Artificial intelligence technology inferring issues and logically supporting facts from raw text

개요 비정형 텍스트를 학습하여 쟁점별 사실과 논리적 근거 추론이 가능한 인공지능 원천기술 Artificial intelligence techno

6 Dec 29, 2021
[Link]mareteutral - pars tradg wth M []

pairs-trading-with-ML Jonathan Larkin, August 2017 One popular strategy classification is Pairs Trading. Though this category of strategies can exhibi

Jonathan Larkin 134 Jan 06, 2023
Iterative Normalization: Beyond Standardization towards Efficient Whitening

IterNorm Code for reproducing the results in the following paper: Iterative Normalization: Beyond Standardization towards Efficient Whitening Lei Huan

Lei Huang 21 Dec 27, 2022
ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data

ARKitScenes This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D

Apple 371 Jan 05, 2023
Fantasy Points Prediction and Dream Team Formation

Fantasy-Points-Prediction-and-Dream-Team-Formation Collected Data from open source resources that have over 100 Parameters for predicting cricket play

Akarsh Singh 2 Sep 13, 2022
Point detection through multi-instance deep heatmap regression for sutures in endoscopy

Suture detection PyTorch This repo contains the reference implementation of suture detection model in PyTorch for the paper Point detection through mu

artificial intelligence in the area of cardiovascular healthcare 3 Jul 16, 2022
GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery

GLODISMO: Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery This is the code to the paper: Gradient-Based Learn

3 Feb 15, 2022
Self-training with Weak Supervision (NAACL 2021)

This repo holds the code for our weak supervision framework, ASTRA, described in our NAACL 2021 paper: "Self-Training with Weak Supervision"

Microsoft 148 Nov 20, 2022