Multi-angle c(q)uestion answering

Related tags

Deep Learningmacaw
Overview

Macaw

Introduction

Macaw (Multi-angle c(q)uestion answering) is a ready-to-use model capable of general question answering, showing robustness outside the domains it was trained on. It has been trained in "multi-angle" fashion, which means it can handle a flexible set of input and output "slots" (like question, answer, explanation) .

Macaw was built on top of T5 and comes in different sizes: macaw-11b, macaw-3b, and macaw-large, as well as an answer-focused version featured on various leaderboards: macaw-answer-11b (see below).

Examples

Some suggestive examples from the Macaw (11B) model, for different angles:

  • (Q→A) Given a question, what's the answer?
    Q: James went camping in the woods, but forgot to bring a hammer to bang the tent pegs in. What else might he use?
    → A: rocks

  • (QM→A) Given a question and answer choices, what's the answer?
    Q: James went camping in the woods, but forgot to bring a hammer to bang the tent pegs in. What else might he use?
    M: (A) a leaf (B) a log (C) a worm
    → A: a log

  • (Q→AE) Given a question, what's the answer and an explanation?
    Q: Which force pulls objects to the ground?
    → A: gravity
    → E: Gravitational force causes objects that have mass to be pulled down on a planet.

  • (A→QE) Given an answer, what's a plausible question and explanation?
    A: elephant
    → Q: Which animal has the largest ears?
    → E: The ears of an elephant are the largest.

  • (C→QA) Given a context, what's a plausible question and answer?
    C: A car needs a battery to start.
    → Q: What is required for a car to start?
    → A: battery

For many more examples of the basic Q→A angle, see examples.md.

Usage examples

Macaw can easily be used in the Hugging Face transformers library, as shown here for the smallest model (the smallest model is not generally recommended, but has much smaller footprint), where given a question we want to return an answer and suggested multiple-choice answer options.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("allenai/macaw-large")
model = AutoModelForSeq2SeqLM.from_pretrained("allenai/macaw-large")
input_string = "$answer$ ; $mcoptions$ ; $question$ = What is the color of a cloudy sky?"
input_ids = tokenizer.encode(input_string, return_tensors="pt")
output = model.generate(input_ids, max_length=200)

>>> tokenizer.batch_decode(output, skip_special_tokens=True)
['$answer$ = gray ; $mcoptions$ = (A) blue (B) white (C) grey (D) white']

(run pip install -r requirements.txt if any dependencies are missing). Note there's no guarantee the different slots are fully coherent, as in gray/grey (and duplicate "white") here, more so for the macaw-large model vs the larger ones.

The code in macaw/utils.py includes some convenience wrappers, such as load_model and run_macaw, here are some examples loading the macaw-11b model onto two GPUs (need around 48GB total GPU memory for the largest model to work):

from macaw.utils import load_model, run_macaw
model_dict = load_model("allenai/macaw-11b", cuda_devices=[0,1])
res1 = run_macaw("Q: Which force pulls objects to the ground?\nA\nE", model_dict)
# Alternate input syntax
res2 = run_macaw({"Q:":"Which force causes a compass needle to point north?", "A":""}, model_dict)
# Add sampling options for the output
res3 = run_macaw("Q: Which force pulls objects to the ground?\nA\nE", model_dict, {"do_sample": True, "temperature": 2.0})

>>> [print(res["output_slots_list"][0]) for res in [res1, res2, res3]]
{'answer': 'gravity', 'explanation': 'Gravitational force causes objects that have mass to be pulled down on a planet.'}
{'answer': 'magnetism'}
{'answer': 'gravitional force', 'explanation': 'Gravitational force causes objects that have mass to be pulled down on a planet.'}

For batch evaluation of instances at various angles, see macaw/batch_eval.py for pointers.

Supported slots

Here are the slots available in Macaw, generally applicable for both input and output:

Slot name Description Example
question (Q) Question text What is the color of a cloudy sky?
answer (A) Answer text The sky is blue
mcoptions (M) Multiple-choice answer options (A) blue (B) white (C) grey
context (C) Potentially relevant context (noisy IR) The sky looks blue to us because...
explanation (E) Sentences explaining the answer A cloudy sky is usually gray in color...

An angle is a specific set of input/output slots, for instance QM->AE is the task of producing answer and explanation, given a question and multiple-choice options. Macaw is trained on a wide variety of angles and handles unseen angles as well, one exception is that the context (C) only appears as an input slot in the training data.

The Challenge300 dataset of probing questions

The Challenge300 dataset of 300 diverse probing examples can be found in challenge300-probes-v1.jsonl. The basic Q→A output from Macaw (at different sizes), as well as outputs from GPT3, Jurassic-1 and alternate T5 models trained on NaturalQuestions, can be seen in examples.md.

Demo

See DEMO.md for instructions and code to host an interactive version of Macaw.

Training data

Macaw was trained in two steps from the text-to-text transformer model T5:

  1. Multi-angle version of UnifiedQA by fine-tuning T5 on the following 7 datasets and associated angles:

  2. Further fine-tuning of Multi-Angle UnifiedQA on multiple-choice and direct-answer elementary science questions, along with (up to 5) explanation sentences from WorldTreeV2:

    • ARC: QMC→AE, AQC→M, QMEC→A, QME→A, QE→A, QMC→A, QC→AE, QM→AE, QMAC→E, QMA→E
    • ARC-DA: QC→AE, Q→AE, QC→A, Q→A, QEC→A, QE→A, AE→Q, AC→Q, QA→E, AQC→E
  3. A specialized answer-focused model, macaw-answer-11b (called "UnifiedQA + ARC MC/DA + IR" on the leaderboards for ARC, ARC-Easy, and ARC-DA) was trained on a smaller set of angles, not including explanations:

    • ARC: QMC→A, QAC→M, QC→A, QM→A, MAC→Q, AC→QM, M→QA
    • ARC-DA: QC→A, Q→A, AC→Q, C→QA

Available models

The Macaw models can be accessed from the Hugging Face model hub:

For a sense of the degradation in performance for the smaller sizes, here are baseline scores on the ARC Challenge and ARC Easy multiple-choice development questions. Included are variants with and without IR context from a large science corpus (corresponding to angles QMC→A and QM→A respectively).

Model ARC Challenge ARC Challenge (no IR) ARC Easy ARC Easy (no IR)
Macaw (11B) 76.9 74.6 91.2 84.9
Macaw-3B 68.2 67.9 87.9 77.7
Macaw-large 57.2 50.5 82.5 63.9
Macaw-answer (11B) 79.9 75.2 90.5 85.8

Disclaimer

As a model capable of generating free form text, the output of the model is not guaranteed to be free of offensive material, so appropriate caution is advised when using the model.

Citation

If you use Macaw in your work, please reference the related paper using

@article{Tafjord2021Macaw,
  title={General-Purpose Question-Answering with {M}acaw},
  author={Oyvind Tafjord and Peter Clark},
  journal={ArXiv},
  year={2021},
  volume={abs/2109.02593}
}
The most simple and minimalistic navigation dashboard.

Navigation This project follows a goal to have simple and lightweight dashboard with different links. I use it to have my own self-hosted service dash

Yaroslav 23 Dec 23, 2022
Best Practices on Recommendation Systems

Recommenders What's New (February 4, 2021) We have a new relase Recommenders 2021.2! It comes with lots of bug fixes, optimizations and 3 new algorith

Microsoft 14.8k Jan 03, 2023
Pytorch implementation of DeePSiM

Pytorch implementation of DeePSiM

1 Nov 05, 2021
Official Repository for our ICCV2021 paper: Continual Learning on Noisy Data Streams via Self-Purified Replay

Continual Learning on Noisy Data Streams via Self-Purified Replay This repository contains the official PyTorch implementation for our ICCV2021 paper.

Jinseo Jeong 22 Nov 23, 2022
Generative vs Discriminative: Rethinking The Meta-Continual Learning (NeurIPS 2021)

Generative vs Discriminative: Rethinking The Meta-Continual Learning (NeurIPS 2021) In this repository we provide PyTorch implementations for GeMCL; a

4 Apr 15, 2022
CSD: Consistency-based Semi-supervised learning for object Detection

CSD: Consistency-based Semi-supervised learning for object Detection (NeurIPS 2019) By Jisoo Jeong, Seungeui Lee, Jee-soo Kim, Nojun Kwak Installation

80 Dec 15, 2022
Pseudo-Visual Speech Denoising

Pseudo-Visual Speech Denoising This code is for our paper titled: Visual Speech Enhancement Without A Real Visual Stream published at WACV 2021. Autho

Sindhu 94 Oct 22, 2022
Serverless proxy for Spark cluster

Hydrosphere Mist Hydrosphere Mist is a serverless proxy for Spark cluster. Mist provides a new functional programming framework and deployment model f

hydrosphere.io 317 Dec 01, 2022
PyTorch implementation code for the paper MixCo: Mix-up Contrastive Learning for Visual Representation

How to Reproduce our Results This repository contains PyTorch implementation code for the paper MixCo: Mix-up Contrastive Learning for Visual Represen

opcrisis 46 Dec 15, 2022
A universal framework for learning timestamp-level representations of time series

TS2Vec This repository contains the official implementation for the paper Learning Timestamp-Level Representations for Time Series with Hierarchical C

Zhihan Yue 284 Dec 30, 2022
FastFace: Lightweight Face Detection Framework

Light Face Detection using PyTorch Lightning

Ömer BORHAN 75 Dec 05, 2022
Replication attempt for the Protein Folding Model

RGN2-Replica (WIP) To eventually become an unofficial working Pytorch implementation of RGN2, an state of the art model for MSA-less Protein Folding f

Eric Alcaide 36 Nov 29, 2022
Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System

Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System The possibilities to involve

Babu Kumaran Nalini 0 Nov 19, 2021
A toolkit for Lagrangian-based constrained optimization in Pytorch

Cooper About Cooper is a toolkit for Lagrangian-based constrained optimization in Pytorch. This library aims to encourage and facilitate the study of

Cooper 34 Jan 01, 2023
Repositório da disciplina de APC, no segundo semestre de 2021

NOTAS FINAIS: https://github.com/fabiommendes/apc2018/blob/master/nota-final.pdf Algoritmos e Programação de Computadores Este é o Git da disciplina A

16 Dec 16, 2022
Compact Bidirectional Transformer for Image Captioning

Compact Bidirectional Transformer for Image Captioning Requirements Python 3.8 Pytorch 1.6 lmdb h5py tensorboardX Prepare Data Please use git clone --

YE Zhou 19 Dec 12, 2022
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools

Hugging Face Optimum 🤗 Optimum is an extension of 🤗 Transformers, providing a set of performance optimization tools enabling maximum efficiency to t

Hugging Face 842 Dec 30, 2022
Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020

Learning Canonical Representations for Scene Graph to Image Generation (ECCV 2020) Roei Herzig*, Amir Bar*, Huijuan Xu, Gal Chechik, Trevor Darrell, A

roei_herzig 24 Jul 07, 2022
This is a official repository of SimViT.

SimViT This is a official repository of SimViT. We will open our models and codes about object detection and semantic segmentation soon. Our code refe

ligang 57 Dec 15, 2022
[ICLR 2021] Is Attention Better Than Matrix Decomposition?

Enjoy-Hamburger 🍔 Official implementation of Hamburger, Is Attention Better Than Matrix Decomposition? (ICLR 2021) Under construction. Introduction T

Gsunshine 271 Dec 29, 2022