Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Related tags

Text Data & NLPembert
Overview

EmBERT: A Transformer Model for Embodied, Language-guided Visual Task Completion

We present Embodied BERT (EmBERT), a transformer-based model which can attend to high-dimensional, multi-modal inputs across long temporal horizons for language-conditioned task completion. Additionally, we bridge the gap between successful object-centric navigation models used for non-interactive agents and the language-guided visual task completion benchmark, ALFRED, by introducing object navigation targets for EmBERT training. We achieve competitive performance on the ALFRED benchmark, and EmBERT marks the first transformer-based model to successfully handle the long-horizon, dense, multi-modal histories of ALFRED, and the first ALFRED model to utilize object-centric navigation targets.

In this repository, we provide the entire codebase which is used for training and evaluating EmBERT performance on the ALFRED dataset. It's mostly based on AllenNLP and PyTorch-Lightning therefore it's inherently easily to extend.

Setup

We used Anaconda for our experiments. Please create an anaconda environment and then install the project dependencies with the following command:

pip install -r requirements.txt

As next step, we will download the ALFRED data using the script scripts/download_alfred_data.sh as follows:

sh scripts/donwload_alfred_data.sh json_feat

Before doing so, make sure that you have installed p7zip because is used to extract the trajectory files.

MaskRCNN fine-tuning

We provide the code to fine-tune a MaskRCNN model on the ALFRED dataset. To create the vision dataset, use the script scripts/generate_vision_dataset.sh. This will create the dataset splits required by the training process. After this, it's possible to run the model fine-tuning using:

PYTHONPATH=. python vision/finetune.py --batch_size 8 --gradient_clip_val 5 --lr 3e-4 --gpus 1 --accumulate_grad_batches 2 --num_workers 4 --save_dir storage/models/vision/maskrcnn_bs_16_lr_3e-4_epochs_46_7k_batches --max_epochs 46 --limit_train_batches 7000

We provide this code for reference however in our experiments we used the MaskRCNN model from MOCA which applies more sophisticated data augmentation techniques to improve performance on the ALFRED dataset.

ALFRED Visual Features extraction

MaskRCNN

The visual feature extraction script is responsible for generating the MaskRCNN features as well as orientation information for every bounding box. For the MaskrCNN model, we use the pretrained model from MOCA. You can download it from their GitHub page. First, we create the directory structure and then download the model weights:

mkdir -p storage/models/vision/moca_maskrcnn;
wget https://alfred-colorswap.s3.us-east-2.amazonaws.com/weight_maskrcnn.pt -O storage/models/vision/moca_maskrcnn/weight_maskrcnn.pt; 

We extract visual features for training trajectories using the following command:

sh scripts/generate_moca_maskrcnn.sh

You can refer to the actual extraction script scripts/generate_maskrcnn_horizon0.py for additional parameters. We executed this command on an p3.2xlarge instance with NVIDIA V100. This command will populate the directory storage/data/alfred/json_feat_2.1.0/ with the visual features for each trajectory step. In particular, the parameter --features_folder will specify the subdirectory (for each trajectory) that will contain the compressed NumPy files constituting the features. Each NumPy file has the following structure:

dict(
    box_features=np.array,
    roi_angles=np.array,
    boxes=np.array,
    masks=np.array,
    class_probs=np.array,
    class_labels=np.array,
    num_objects=int,
    pano_id=int
)

Data-augmentation procedure

In our paper, we describe a procedure to augment the ALFREd trajectories with object and corresponding receptacle information. In particular, we reply the trajectories and we make sure to track object and its receptacle during a subgoal. The data augmentation script will create a new trajectory file called ref_traj_data.json that mimics the same data structure of the original ALFRED dataset but adds to it a few fields for each action.

To start generating the refined data, use the following script:

PYTHONPATH=. python scripts/generate_landmarks.py 

EmBERT Training

Vocabulary creation

We use AllenNLP for training our models. Before starting the training we will generate the vocabulary for the model using the following command:

allennlp build-vocab training_configs/embert/embert_oscar.jsonnet storage/models/embert/vocab.tar.gz --include-package grolp

Training

First, we need to download the OSCAR checkpoint before starting the training process. We used a version of OSCAR which doesn't use object labels which can be freely downloaded following the instruction on GitHub. Make sure to download this file in the folder storage/models/pretrained using the following commands:

mkdir -p storage/models/pretrained/;
wget https://biglmdiag.blob.core.windows.net/oscar/pretrained_models/base-no-labels.zip -O storage/models/pretrained/oscar.zip;
unzip storage/models/pretrained/oscar.zip -d storage/models/pretrained/;
mv storage/models/pretrained/base-no-labels/ep_67_588997/pytorch_model.bin storage/models/pretrained/oscar-base-no-labels.bin;
rm storage/models/pretrained/oscar.zip;

A new model can be trained using the following command:

allennlp train training_configs/embert/embert_widest.jsonnet -s storage/models/alfred/embert --include-package grolp

When training for the first time, make sure to add to the previous command the following parameters: --preprocess --num_workers 4. This will make sure that the dataset is preprocessed and cached in order to speedup training. We run training using AWS EC2 instances p3.8xlarge with 16 workers on a single GPU per configuration.

The configuration file training_configs/embert/embert_widest.jsonnet contains all the parameters that you might be interested in if you want to change the way the model works or any reference to the actual features files. If you're interested in how to change the model itself, please refer to the model definition. The parameters in the constructor of the class will reflect the ones reported in the configuration file. In general, this project has been developed by using AllenNLP has a reference framework. We refer the reader to the official AllenNLP documentation for more details about how to structure a project.

EmBERT evaluation

We modified the original ALFRED evaluation script to make sure that the results are completely reproducible. Refer to the original repository for more information.

To run the evaluation on the valid_seen and valid_unseen you can use the provided script scripts/run_eval.sh in order to evaluate your model. The EmBERT trainer has different ways of saving checkpoints. At the end of the training, it will automatically save the best model in an archive named model.tar.gz in the destination folder (the one specified with -s). To evaluate it run the following command:

sh scripts/run_eval.sh <your_model_path>/model.tar.gz 

It's also possible to run the evaluation of a specific checkpoint. This can be done by running the previous command as follows:

sh scripts/run_eval.sh <your_model_path>/model-epoch=6.ckpt

In this way the evaluation script will load the checkpoint at epoch 6 in the path . When specifying a checkpoint directly, make sure that the folder contains both config.json file and vocabulary directory because they are required by the script to load all the correct model parameters.

Citation

If you're using this codebase please cite our work:

@article{suglia:embert,
  title={Embodied {BERT}: A Transformer Model for Embodied, Language-guided Visual Task Completion},
  author={Alessandro Suglia and Qiaozi Gao and Jesse Thomason and Govind Thattai and Gaurav Sukhatme},
  journal={arXiv},
  year={2021},
  url={https://arxiv.org/abs/2108.04927}
}
PyJPBoatRace: Python-based Japanese boatrace tools 🚤

pyjpboatrace :speedboat: provides you with useful tools for data analysis and auto-betting for boatrace.

5 Oct 29, 2022
A python project made to generate code using either OpenAI's codex or GPT-J (Although not as good as codex)

CodeJ A python project made to generate code using either OpenAI's codex or GPT-J (Although not as good as codex) Install requirements pip install -r

TheProtagonist 1 Dec 06, 2021
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

Natural Language Processing @UCLA 464 Jan 04, 2023
Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

Neosapience 103 Dec 23, 2022
Word Bot for JKLM Bomb Party

Word Bot for JKLM Bomb Party A bot for Bomb Party on https://www.jklm.fun (Only English) Requirements pynput pyperclip pyautogui Usage: Step 1: Run th

Nicolas 7 Oct 30, 2022
Code for lyric-section-to-comment generation based on huggingface transformers.

CommentGeneration Code for lyric-section-to-comment generation based on huggingface transformers. Migrate Guyu model and code (both 12-layers and 24-l

Yawei Sun 8 Sep 04, 2021
A simple Flask site that allows users to create, update, and delete posts in a database, as well as perform basic NLP tasks on the posts.

A simple Flask site that allows users to create, update, and delete posts in a database, as well as perform basic NLP tasks on the posts.

Ian 1 Jan 15, 2022
Ecco is a python library for exploring and explaining Natural Language Processing models using interactive visualizations.

Visualize, analyze, and explore NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BER

Jay Alammar 1.6k Dec 25, 2022
Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Semantic Segmentation".

Dual Path Learning for Domain Adaptation of Semantic Segmentation Official PyTorch implementation of "Dual Path Learning for Domain Adaptation of Sema

27 Dec 22, 2022
Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Code for paper Multitask-Finetuning of Zero-shot Vision-Language Models

Zhenhailong Wang 2 Jul 15, 2022
Exploration of BERT-based models on twitter sentiment classifications

twitter-sentiment-analysis Explore the relationship between twitter sentiment of Tesla and its stock price/return. Explore the effect of different BER

Sammy Cui 2 Oct 02, 2022
PyTorch Implementation of the paper Single Image Texture Translation for Data Augmentation

SITT The repo contains official PyTorch Implementation of the paper Single Image Texture Translation for Data Augmentation. Authors: Boyi Li Yin Cui T

Boyi Li 52 Jan 05, 2023
2021搜狐校园文本匹配算法大赛baseline

sohu2021-baseline 2021搜狐校园文本匹配算法大赛baseline 简介 分享了一个搜狐文本匹配的baseline,主要是通过条件LayerNorm来增加模型的多样性,以实现同一模型处理不同类型的数据、形成不同输出的目的。 线下验证集F1约0.74,线上测试集F1约0.73。

苏剑林(Jianlin Su) 45 Sep 06, 2022
Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

K-BERT Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph", which is implemented based on the UER framework. R

Weijie Liu 834 Jan 09, 2023
This repository has a implementations of data augmentation for NLP for Japanese.

daaja This repository has a implementations of data augmentation for NLP for Japanese: EDA: Easy Data Augmentation Techniques for Boosting Performance

Koga Kobayashi 60 Nov 11, 2022
Signature remover is a NLP based solution which removes email signatures from the rest of the text.

Signature Remover Signature remover is a NLP based solution which removes email signatures from the rest of the text. It helps to enchance data conten

Forges Alterway 8 Jan 06, 2023
Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

EasyNMT - Easy to use, state-of-the-art Neural Machine Translation This package provides easy to use, state-of-the-art machine translation for more th

Ubiquitous Knowledge Processing Lab 748 Jan 06, 2023
Stack based programming language that compiles to x86_64 assembly or can alternatively be interpreted in Python

lang lang is a simple stack based programming language written in Python. It can

Christoffer Aakre 1 May 30, 2022
ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost LOVE is accpeted by ACL22 main conference as a long pape

Lihu Chen 32 Jan 03, 2023
A number of methods in order to perform Natural Language Processing on live data derived from Twitter

A number of methods in order to perform Natural Language Processing on live data derived from Twitter

1 Nov 24, 2021