YouRefIt: Embodied Reference Understanding with Language and Gesture

Last update: Jul 11, 2022

Related tags

Deep Learning YouRefIt_ERU

Overview

YouRefIt: Embodied Reference Understanding with Language and Gesture

by Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Tao Gao, Yixin Zhu, Song-Chun Zhu and Siyuan Huang

The IEEE International Conference on Computer Vision (ICCV), 2021

Introduction

We study the machine's understanding of embodied reference: One agent uses both language and gesture to refer to an object to another agent in a shared physical environment. To tackle this problem, we introduce YouRefIt, a new crowd-sourced, real-world dataset of embodied reference.

For more details, please refer to our paper.

Checklist

Image ERU
Video ERU

Installation

The code was tested with the following environment: Ubuntu 18.04/20.04, python 3.7/3.8, pytorch 1.9.1. Run

    git clone https://github.com/yixchen/YouRefIt_ERU
    pip install -r requirements.txt

Dataset

Download the YouRefIt dataset from Dataset Request Page and put under ./ln_data

Model weights

Yolov3: download the pretrained model and place the file in ./saved_models by
```
sh saved_models/yolov3_weights.sh
```
More pretrained models are availble Google drive, and should also be placed in ./saved_models.

Make sure to put the files in the following structure:

|-- ROOT
|	|-- ln_data
|		|-- yourefit
|			|-- images
|			|-- paf
|			|-- saliency
|	|-- saved_modeks
|		|-- final_model_full.tar
|		|-- final_resc.tar

Training

Train the model, run the code under main folder.

python train.py --data_root ./ln_data/ --dataset yourefit --gpu gpu_id

Evaluation

Evaluate the model, run the code under main folder. Using flag --test to access test mode.

python train.py --data_root ./ln_data/ --dataset yourefit --gpu gpu_id \
 --resume saved_models/model.pth.tar \
 --test

Evaluate Image ERU on our released model

Evaluate our full model with PAF and saliency feature, run

python train.py --data_root ./ln_data/ --dataset yourefit  --gpu gpu_id \
 --resume saved_models/final_model_full.tar --use_paf --use_sal --large --test

Evaluate baseline model that only takes images as input, run

python train.py --data_root ./ln_data/ --dataset yourefit  --gpu gpu_id \
 --resume saved_models/final_resc.tar --large --test

Evalute the inference results on test set on different IOU levels by changing the path accordingly,

 python evaluate_results.py

Citation

@inProceedings{chen2021yourefit,
 title={YouRefIt: Embodied Reference Understanding with Language and Gesture},
 author = {Chen, Yixin and Li, Qing and Kong, Deqian and Kei, Yik Lun and Zhu, Song-Chun and Gao, Tao and Zhu, Yixin and Huang, Siyuan},
 booktitle={The IEEE International Conference on Computer Vision (ICCV),
 year={2021}
 }

Acknowledgement

Our code is built on ReSC and we thank the authors for their hard work.

YouRefIt: Embodied Reference Understanding with Language and Gesture

Related tags

Overview

YouRefIt: Embodied Reference Understanding with Language and Gesture

Introduction

Checklist

Installation

Dataset

Model weights

Training

Evaluation

Evaluate Image ERU on our released model

Citation

Acknowledgement

Owner

In the AI for TSP competition we try to solve optimization problems using machine learning.

COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset

Beyond imagenet attack (accepted by ICLR 2022) towards crafting adversarial examples for black-box domains.

Code for DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

Pytorch implementation of Learning with Opponent-Learning Awareness

ComPhy: Compositional Physical Reasoning ofObjects and Events from Videos

MonoScene: Monocular 3D Semantic Scene Completion

Face Mask Detector by live camera using tensorflow-keras, openCV and Python

House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects

Code for a real-time distributed cooperative slam(RDC-SLAM) system for ROS compatible platforms.

Analyzes your GitHub Profile and presents you with a report on how likely you are to become the next MLH Fellow!

SemiNAS: Semi-Supervised Neural Architecture Search

PyTorch implementation of Deep HDR Imaging via A Non-Local Network (TIP 2020).

PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Code for the paper "Attention Approximates Sparse Distributed Memory"

[Nature Machine Intelligence' 21] "Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence"

A TensorFlow implementation of Neural Program Synthesis from Diverse Demonstration Videos

The implementation of ICASSP 2020 paper "Pixel-level self-paced learning for super-resolution"

Tensorflow solution of NER task Using BiLSTM-CRF model with Google BERT Fine-tuning And private Server services