TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

Related tags

Text Data & NLPteach
Overview

TEACh

Task-driven Embodied Agents that Chat

Aishwarya Padmakumar*, Jesse Thomason*, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment. The code is licensed under the MIT License (see SOFTWARELICENSE), images are licensed under Apache 2.0 (see IMAGESLICENSE) and other data files are licensed under CDLA-Sharing 1.0 (see DATALICENSE). Please include appropriate licensing and attribution when using our data and code, and please cite our paper.

Prerequisites

  • python3 >=3.7,<=3.8
  • python3.x-dev, example: sudo apt install python3.8-dev
  • tmux, example: sudo apt install tmux
  • xorg, example: sudo apt install xorg openbox
  • ffmpeg, example: sudo apt install ffmpeg

Installation

pip install -r requirements.txt
pip install -e .

Downloading the dataset

Run the following script:

teach_download 

This will download and extract the archive files (experiment_games.tar.gz, all_games.tar.gz, images_and_states.tar.gz, edh_instances.tar.gz & tfd_instances.tar.gz) in the default directory (/tmp/teach-dataset).
Optional arguments:

  • -d/directory: The location to store the dataset into. Default=/tmp/teach-dataset.
  • -se/--skip-extract: If set, skip extracting archive files.
  • -sd/--skip-download: If set, skip downloading archive files.
  • -f/--file: Specify the file name to be retrieved from S3 bucket.

Remote Server Setup

If running on a remote server without a display, the following setup will be needed to run episode replay, model inference of any model training that invokes the simulator (student forcing / RL).

Start an X-server

tmux
sudo python ./bin/startx.py

Exit the tmux session (CTRL+B, D). Any other commands should be run in the main terminal / different sessions.

Replaying episodes

Most users should not need to do this since we provide this output in images_and_states.tar.gz.

The following steps can be used to read a .json file of a gameplay session, play it in the AI2-THOR simulator, and at each time step save egocentric observations of the Commander and Driver (Follower in the paper). It also saves the target object panel and mask seen by the Commander, and the difference between current and initial state.

Replaying a single episode locally, or in a new tmux session / main terminal of remote headless server:

teach_replay \
--game_fn /path/to/game/file \
--write_frames_dir /path/to/desired/output/images/dir \
--write_frames \
--write_states \
--status-out-fn /path/to/desired/output/status/file.json

Note that --status-out-fn must end in .json Also note that the script will by default not replay sessions for which an output subdirectory already exists under --write-frames-dir Additionally, if the file passed to --status-out-fn already exists, the script will try to resume files not marked as replayed in that file. It will error out if there is a mismatch between the status file and output directories on which sessions have been previously played. It is recommended to use a new --write-frames-dir and new --status-out-fn for additional runs that are not intended to resume from a previous one.

Replay all episodes in a folder locally, or in a new tmux session / main terminal of remote headless server:

teach_replay \
--game_dir /path/to/dir/containing/.game.json/files \
--write_frames_dir /path/to/desired/output/images/dir \
--write_frames \
--write_states \
--num_processes 50 \
--status-out-fn /path/to/desired/output/status/file.json

To generate a video, additionally specify --create_video. Note that for images to be saved, --write_images must be specified and --write-frames-dir must be provided. For state changes to be saved, --write_states must be specified and --write_frames_dir must be provided.

Evaluation

We include sample scripts for inference and calculation of metrics. teach_inference and teach_eval. teach_inference is a wrapper that implements loading EDH instance, interacting with the simulator as well as writing the game file and predicted action sequence as JSON files after each inference run. It dynamically loads the model based on the --model_module and --model_class arguments. Your model has to implement teach.inference.teach_model.TeachModel. See teach.inference.sample_model.SampleModel for an example implementation which takes random actions at every time step.

After running teach_inference, you use teach_eval to compute the metrics based output data produced by teach_inference.

Sample run:

export DATA_DIR=/path/to/data/with/games/and/edh_instances/as/subdirs (Default in Downloading is /tmp/teach-dataset)
export OUTPUT_DIR=/path/to/output/folder/for/split
export METRICS_FILE=/path/to/output/metrics/file_without_extension

teach_inference \
    --data_dir $DATA_DIR \
    --output_dir $OUTPUT_DIR \
    --split valid_seen \
    --metrics_file $METRICS_FILE \
    --model_module teach.inference.sample_model \
    --model_class SampleModel

teach_eval \
    --data_dir $DATA_DIR \
    --inference_output_dir $OUTPUT_DIR \
    --split valid_seen \
    --metrics_file $METRICS_FILE

Security

See CONTRIBUTING for more information.

License

The code is licensed under the MIT License (see SOFTWARELICENSE), images are licensed under Apache 2.0 (see IMAGESLICENSE) and other data files are licensed under CDLA-Sharing 1.0 (see DATALICENSE).

Owner
Alexa
Alexa
End-to-end MLOps pipeline of a BERT model for emotion classification.

image source EmoBERT-MLOps The goal of this repository is to build an end-to-end MLOps pipeline based on the MLOps course from Made with ML, but this

Dimitre Oliveira 4 Nov 06, 2022
Optimal Transport Tools (OTT), A toolbox for all things Wasserstein.

Optimal Transport Tools (OTT), A toolbox for all things Wasserstein. See full documentation for detailed info on the toolbox. The goal of OTT is to pr

OTT-JAX 255 Dec 26, 2022
Knowledge Oriented Programming Language

KoPL: 面向知识的推理问答编程语言 安装 | 快速开始 | 文档 KoPL全称 Knowledge oriented Programing Language, 是一个为复杂推理问答而设计的编程语言。我们可以将自然语言问题表示为由基本函数组合而成的KoPL程序,程序运行的结果就是问题的答案。目前,

THU-KEG 62 Dec 12, 2022
Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation This is the official PyTorch implementation

Salesforce 564 Jan 08, 2023
ASCEND Chinese-English code-switching dataset

ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong.

CAiRE 11 Dec 09, 2022
Modified GPT using average pooling to reduce the softmax attention memory constraints.

NLP-GPT-Upsampling This repository contains an implementation of Open AI's GPT Model. In particular, this implementation takes inspiration from the Ny

WD 1 Dec 03, 2021
A Python script which randomly chooses and prints a file from a directory.

___ ____ ____ _ __ ___ / _ \ | _ \ | _ \ ___ _ __ | '__| / _ \ | |_| || | | || | | | / _ \| '__| | | | __/ | _ || |_| || |_| || __

yesmaybenookay 0 Aug 06, 2021
Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

UBC Computer Vision Group 358 Dec 24, 2022
LSTM model - IMDB review sentiment analysis

NLP - Movie review sentiment analysis The colab notebook contains the code for building a LSTM Recurrent Neural Network that gives 87-88% accuracy on

Sundeep Bhimireddy 1 Jan 29, 2022
A python wrapper around the ZPar parser for English.

NOTE This project is no longer under active development since there are now really nice pure Python parsers such as Stanza and Spacy. The repository w

ETS 49 Sep 12, 2022
nlp基础任务

NLP算法 说明 此算法仓库包括文本分类、序列标注、关系抽取、文本匹配、文本相似度匹配这五个主流NLP任务,涉及到22个相关的模型算法。 框架结构 文件结构 all_models ├── Base_line │   ├── __init__.py │   ├── base_data_process.

zuxinqi 23 Sep 22, 2022
Various Algorithms for Short Text Mining

Short Text Mining in Python Introduction This package shorttext is a Python package that facilitates supervised and unsupervised learning for short te

Kwan-Yuet 466 Dec 06, 2022
aMLP Transformer Model for Japanese

aMLP-japanese Japanese aMLP Pretrained Model aMLPとは、Liu, Daiらが提案する、Transformerモデルです。 ざっくりというと、BERTの代わりに使えて、より性能の良いモデルです。 詳しい解説は、こちらの記事などを参考にしてください。 この

tanreinama 13 Aug 11, 2022
String Gen + Word Checker

Creates random strings and checks if any of them are a real words. Mostly a waste of time ngl but it is cool to see it work and the fact that it can generate a real random word within10sec

1 Jan 06, 2022
Model for recasing and repunctuating ASR transcripts

Recasing and punctuation model based on Bert Benoit Favre 2021 This system converts a sequence of lowercase tokens without punctuation to a sequence o

Benoit Favre 88 Dec 29, 2022
SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors [Paper] [Project Website] Pytorch implementation for SAVI2I. We

Qi Mao 44 Dec 30, 2022
Unofficial PyTorch implementation of Google AI's VoiceFilter system

VoiceFilter Note from Seung-won (2020.10.25) Hi everyone! It's Seung-won from MINDs Lab, Inc. It's been a long time since I've released this open-sour

MINDs Lab 881 Jan 03, 2023
Levenshtein and Hamming distance computation

distance - Utilities for comparing sequences This package provides helpers for computing similarities between arbitrary sequences. Included metrics ar

112 Dec 22, 2022
Search Git commits in natural language

NaLCoS - NAtural Language COmmit Search Search commit messages in your repository in natural language. NaLCoS (NAtural Language COmmit Search) is a co

Pushkar Patel 50 Mar 22, 2022
GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning GrammarTagger is an open-source toolkit for grammatical profiling for lan

Octanove Labs 27 Jan 05, 2023