This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

Last update: Dec 09, 2022

Related tags

Overview

xGQA

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

xGQA builds on the original work of Hudson et al. 2019: GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering. The training data can be downloaded here.

Overview

The repository is structured as follows:

data/zero_shot/ contains the xGQA test-dev files for all 8 languages
data/few_shot/ contains the new standard splits for few shot learning. The number in the file name indicates how many distinct images the split includes. i.e. train_10.json implies that this subset contains questions about 10 distinct images.

Training Data

Please download the English training data of GQA (Hudson et al. 2019) here.

Zero-Shot Results

Zero-shot transfer results on xGQA when transferring from English GQA. Average accuracy is reported. Mean scores are not averaged over the source language (English).

model	en	de	pt	ru	id	bn	ko	zh	mean
M3P	58.43	23.93	24.37	20.37	22.57	15.83	16.90	18.60	20.37
OSCAR+Emb	62.23	17.35	19.25	10.52	18.26	14.93	17.10	16.41	16.26
OSCAR+Ada	60.30	18.91	27.02	17.50	18.77	15.42	15.28	14.96	18.27
mBERTAda	56.25	29.76	30.37	24.42	19.15	15.12	19.09	24.86	23.25

Few-Shot

Few-shot dataset sizes. The GQA test-dev set is split into new development, test sets, and training splits of different sizes. We maintain the distribution of structural types in each split.

Set	Test	Dev	Train
#Images	300	50	1	5	10	20	25	48
#Questions	9666	1422	27	155	317	594	704	1490

Citation

If you find this repository helpful, please cite our paper "xGQA: Cross-lingual Visual Question Answering":

@article{pfeiffer-etal-2021-xGQA,
    title={{xGQA: Cross-Lingual Visual Question Answering}},
    author={ Jonas Pfeiffer and Gregor Geigle and Aishwarya Kamath and Jan-Martin O. Steitz and Stefan Roth and Ivan Vuli{\'{c}} and Iryna Gurevych},
    journal = "arXiv preprint", 
    year = "2021",  
    url = "https://arxiv.org/pdf/2109.06082.pdf"
}

Shield:

This work is licensed under a Creative Commons Attribution 4.0 International License.

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

Related tags

Overview

xGQA

Overview

Training Data

Zero-Shot Results

Few-Shot

Citation

Owner

AdapterHub

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology

Yolo ros - YOLO-ROS for HUAWEI ATLAS200

A decent AI that solves daily Wordle puzzles. Works with different websites with similar wordlists,.

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

HAT: Hierarchical Aggregation Transformers for Person Re-identification

A framework for the elicitation, specification, formalization and understanding of requirements.

Training, generation, and analysis code for Learning Particle Physics by Example: Location-Aware Generative Adversarial Networks for Physics

EvoJAX is a scalable, general purpose, hardware-accelerated neuroevolution toolkit

Blender Add-On for slicing meshes with planes

Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (MTCNN)

Official Pytorch Implementation for Splicing ViT Features for Semantic Appearance Transfer presenting Splice

An interpreter for RASP as described in the ICML 2021 paper "Thinking Like Transformers"

Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop (SVRHM)

1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

Spatial Attentive Single-Image Deraining with a High Quality Real Rain Dataset (CVPR'19)

Pytorch implementation of

Employs neural networks to classify images into four categories: ship, automobile, dog or frog

A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

On the Adversarial Robustness of Visual Transformer

An Unsupervised Graph-based Toolbox for Fraud Detection