Towards End-to-end Video-based Eye Tracking

Last update: Dec 12, 2022

Related tags

Deep Learning EVE

Overview

Towards End-to-end Video-based Eye Tracking

The code accompanying our ECCV 2020 publication and dataset, EVE.

Authors: Seonwook Park, Emre Aksan, Xucong Zhang, and Otmar Hilliges
Project page: https://ait.ethz.ch/projects/2020/EVE/
Codalab (test set evaluation and public leaderboard): https://competitions.codalab.org/competitions/28954

Setup

Preferably, setup a Docker image or virtual environment (virtualenvwrapper is recommended) for this repository. Please note that we have tested this code-base in the following environments:

Ubuntu 18.04 / A Linux-based cluster system (CentOS 7.8)
Python 3.6 / Python 3.7
PyTorch 1.5.1

Clone this repository somewhere with:

git clone [email protected]:swook/EVE
cd EVE/

Then from the base directory of this repository, install all dependencies with:

pip install -r requirements.txt

Please note the PyTorch official installation guide for setting up the torch and torchvision packages on your specific system.

You will also need to setup ffmpeg for video decoding. On Linux, we recommend installing distribution-specific packages (usually named ffmpeg). If necessary, check out the official download page or compilation instructions.

Usage

Information on the code framework

Configuration file system

All available configuration parameters are defined in src/core/config_default.py.

In order to override the default values, one can do:

Pass the parameter via a command-line parameter to train.py or inference.py. Note that in this case, replace all _ characters with -. E.g. the config. parameter refine_net_enabled becomes --refine-net-enabled 1. Note that boolean parameters can be passed in via either 0/no/false or 1/yes/true.
Create a JSON file such as src/configs/eye_net.json or src/configs/refine_net.json.

The order of application are:

Default parameters
JSON-provided parameters, in order of JSON file declaration. For instance, in the command python train.py config1.json config2.json, config2.json overrides config1.json entries should there be any overlap.
CLI-provided parameters.

Automatic logging to Google Sheets

This framework implements an automatic logging code of all parameters, loss terms, and metrics to a Google Sheets document. This is done by the gspread library. To enable this possibility, follow these instructions:

Follow the instructions at https://gspread.readthedocs.io/en/latest/oauth2.html#for-end-users-using-oauth-client-id
Set --gsheet-secrets-json-file to a path to the credentials JSON file, and set --gsheet-workbook-key to the document key. This key is the part after https://docs.google.com/spreadsheets/d/ and before any query or hash parameters.

An example config JSON file can be found at src/configs/sample_gsheet.json.

Training a model

To train a model, simply run python train.py from src/ with the appropriate configuration changes that are desired (see "Configuration file system" above).

Note, that in order to resume the training of an existing model you must provide the path to the output folder via the --resume-from argument.

Also, at every fresh run of train.py, a unique identifier is generated to produce a unique output folder in outputs/EVE/. Hence, it is recommended to use the Google Sheets logging feature (see "Automatic logging to Google Sheets") to keep track of your models.

Running inference

The single-sample inference script at src/inference.py takes in the same arguments as train.py but expects two arguments in particular:

--input-path is the path to a basler.mp4 or webcam_l.mp4 or webcam_c.mp4 or webcam_r.mp4 that exists in the EVE dataset.
--output-path is a path to a desired output location (ending in .mp4).

This script works for both training, validation, and test samples and shows the reference point-of-gaze ground-truth when available.

Citation

If using this code-base and/or the EVE dataset in your research, please cite the following publication:

@inproceedings{Park2020ECCV,
  author    = {Seonwook Park and Emre Aksan and Xucong Zhang and Otmar Hilliges},
  title     = {Towards End-to-end Video-based Eye-Tracking},
  year      = {2020},
  booktitle = {European Conference on Computer Vision (ECCV)}
}

Q&A

Q: How do I use this code for screen-based eye tracking?

A: This code does not offer actual eye tracking. Rather, it concerns the benchmarking of the video-based gaze estimation methods outlined in the original paper. Extending this code to support an easy-to-use software for screen-based eye tracking is somewhat non-trivial, due to requirements on camera calibration (intrinsics, extrinsics), and an efficient pipeline for accurate and stable real-time eye or face patch extraction. Thus, we consider this to be beyond the scope of this code repository.

Q: Where are the test set labels?

A: Our public evaluation server and leaderboard are hosted by Codalab at https://competitions.codalab.org/competitions/28954. This allows for evaluations on our test set to be consistent and reliable, and encourage competition in the field of video-based gaze estimation. Please note that the performance reported by Codalab is not strictly speaking comparable to the original paper's results, as we only perform evaluation on a large subset of the full test set. We recommend acquiring the updated performance figures from the leaderboard.

Comments

use against new dataset

Hi,

Can this code be used at inference time against in-the-wild mp4 that do not necessarily provide an accompanying H5? The more I work with this codebase, the more it looks obvious that w/o the mp4 being TOBII generated, this will not work. Is this true?

thank you

opened by inisar 0
File name parser

File name parser can be made more robust to your own dataset files.
Currently doesn't work for both webcam_l.mp4 and webcam_l_eyes.mp4 Please see below for filename and correction I made to make it work. src/core/inference.py try: camera_type = components[-1][:-4] except AssertionError: camera_type = camera_type[:-5]

opened by inisar 0
How to synchronize the data from camera and eye tracker?

Hi, @swook . I use OpenCV to capture the frames, what borthers me is that I don't know how to attach a timestamp to each frame and ensure the interval of each timestamp nearly the same. By using the datetime.time(), I can get the current time and regard it as the timestamp, but the interval between each of the timestamps seems to be different and has a big gap. So could you share me some details about your method which is used to synchronize the data?Or It would be very nice if you can share the source code or your method with me. Thanks.

opened by Kihensarn 0
How to get the 3D gaze origin

Hi, @swook Thanks for your great job, but I have a question about how to get the 3D gaze origin(determined during data pre-processing). The paper said "In pre-processing the EVEdataset, we apply a 3DMM fitting approach with interocular-distance-based scale-normalization to alleviate these issues" . However, I'm not sure about the specific process of this step. What should I do if I want to convert from landmark to 3D gaze origin? Besides, if it is possible to open some code of this part? Thanks a lot!

opened by TeresaKumo 0
About the result

I trained the eve model with eve data, ran eval_codalab.py and got pkl file as a result. I also ran eval_codalabl.py and got pkl file from the pretrained model weights(from https://github.com/swook/EVE/releases/tag/v0.0 - eve_refinenet_CGRU_oa_skip.pt) Then, I compared these two results and the numbers seem to match. For example, from the pretrained model, I got [960. 540.] for PoG_px_final and got [963.0835 650.5635] for my model.

However, in the eve paper, table3 shows that the PoG_px in GRU model with oa+skip is 95.59 Numbers in paper is 1/10 of the numbers i got from eval_codalab and not sure what went wrong. Are they supposed to match? If they are not supposed to match, how do you calculate the numbers?

Also, in the result page of codalab, the gaze direction(angular error) is shown, but the eval_codalab.py doesn't store gaze direction. (Keys_to_store=['left pupil size' , 'right pupil', 'pog__px_initial', 'pog_px_final', 'timestamp']) How should I get gaze direction error in degree?

opened by chaeyoun 1

Releases(v0.0)

v0.0(Sep 20, 2020)

This release contains pre-trained model weight files for the models shown in Tables 2 and 3 of the EVE paper.

Note: the "source code" archive attached to this release is an empty archive.
Source code(tar.gz)
Source code(zip)
eve_eyenet_GRU.pt(43.48 MB)
eve_eyenet_LSTM.pt(43.61 MB)
eve_eyenet_RNN.pt(43.23 MB)
eve_eyenet_static.pt(43.17 MB)
eve_refinenet_CGRU_oa.pt(17.82 MB)
eve_refinenet_CGRU_oa_skip.pt(20.11 MB)
eve_refinenet_CLSTM_oa.pt(18.10 MB)
eve_refinenet_CLSTM_oa_skip.pt(20.39 MB)
eve_refinenet_CRNN_oa.pt(17.26 MB)
eve_refinenet_CRNN_oa_skip.pt(19.55 MB)
eve_refinenet_static.pt(16.97 MB)
eve_refinenet_static_oa.pt(16.97 MB)
eve_refinenet_static_oa_skip.pt(19.27 MB)
sample_codalab_submission.zip(13.82 MB)

Owner

Seonwook Park

GitHub Repository

Unsupervised Attributed Multiplex Network Embedding (AAAI 2020)

Unsupervised Attributed Multiplex Network Embedding (DMGI) Overview Nodes in a multiplex network are connected by multiple types of relations. However

114 Dec 06, 2022

PyTorch implementations of algorithms for density estimation

pytorch-flows A PyTorch implementations of Masked Autoregressive Flow and some other invertible transformations from Glow: Generative Flow with Invert

546 Dec 05, 2022

Job-Recommend-Competition - Vectorwise Interpretable Attentions for Multimodal Tabular Data

SiD - Simple Deep Model Vectorwise Interpretable Attentions for Multimodal Tabul

40 Dec 22, 2022

smc.covid is an R package related to the paper A sequential Monte Carlo approach to estimate a time varying reproduction number in infectious disease models: the COVID-19 case by Storvik et al

smc.covid smc.covid is an R package related to the paper A sequential Monte Carlo approach to estimate a time varying reproduction number in infectiou

0 Oct 15, 2021

PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

PyExplainer PyExplainer is a local rule-based model-agnostic technique for generating explanations (i.e., why a commit is predicted as defective) of J

14 Nov 13, 2022

This is the replication package for paper submission: Towards Training Reproducible Deep Learning Models.

0 Feb 02, 2022

Perfect implement. Model shared. x0.5 (Top1:60.646) and 1.0x (Top1:69.402).

Shufflenet-v2-Pytorch Introduction This is a Pytorch implementation of faceplusplus's ShuffleNet-v2. For details, please read the following papers:

423 Dec 07, 2022

PERIN is Permutation-Invariant Semantic Parser developed for MRP 2020

PERIN: Permutation-invariant Semantic Parsing David Samuel & Milan Straka Charles University Faculty of Mathematics and Physics Institute of Formal an

40 Jan 04, 2023

Human pose estimation from video plays a critical role in various applications such as quantifying physical exercises, sign language recognition, and full-body gesture control.

Pose Detection Project Description: Human pose estimation from video plays a critical role in various applications such as quantifying physical exerci

2 Jan 17, 2022

Official repository for ABC-GAN

ABC-GAN The work represented in this repository is the result of a 14 week semesterthesis on photo-realistic image generation using generative adversa

10 Jun 23, 2022

Official code release for 3DV 2021 paper Human Performance Capture from Monocular Video in the Wild.

58 Dec 24, 2022

Implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hashing by Maximizing Bit Entropy

Deep Unsupervised Image Hashing by Maximizing Bit Entropy This is the PyTorch implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hash

62 Dec 30, 2022

Deep Q-Learning Network in pytorch (not actively maintained)

pytoch-dqn This project is pytorch implementation of Human-level control through deep reinforcement learning and I also plan to implement the followin

342 Jan 01, 2023

Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs

Implementation for the paper: Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs, Nurendra Choudhary, Nikhil Rao, Sumeet Ka

8 Nov 15, 2022

Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.

Kalman and Bayesian Filters in Python Introductory text for Kalman and Bayesian filters. All code is written in Python, and the book itself is written

13k Dec 29, 2022

Towards End-to-end Video-based Eye Tracking

Related tags

Overview

Towards End-to-end Video-based Eye Tracking

Setup

Usage

Information on the code framework

Configuration file system

Automatic logging to Google Sheets

Training a model

Running inference

Citation

Q&A

Comments

use against new dataset

File name parser

How to synchronize the data from camera and eye tracker?

How to get the 3D gaze origin

About the result

Releases(v0.0)

v0.0(Sep 20, 2020)

Owner

Seonwook Park

Unsupervised Attributed Multiplex Network Embedding (AAAI 2020)

PyTorch implementations of algorithms for density estimation

Job-Recommend-Competition - Vectorwise Interpretable Attentions for Multimodal Tabular Data

smc.covid is an R package related to the paper A sequential Monte Carlo approach to estimate a time varying reproduction number in infectious disease models: the COVID-19 case by Storvik et al

PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

This is the replication package for paper submission: Towards Training Reproducible Deep Learning Models.

Perfect implement. Model shared. x0.5 (Top1:60.646) and 1.0x (Top1:69.402).

PERIN is Permutation-Invariant Semantic Parser developed for MRP 2020

Human pose estimation from video plays a critical role in various applications such as quantifying physical exercises, sign language recognition, and full-body gesture control.

Official repository for ABC-GAN

Official code release for 3DV 2021 paper Human Performance Capture from Monocular Video in the Wild.

Implementation of accepted AAAI 2021 paper: Deep Unsupervised Image Hashing by Maximizing Bit Entropy

Deep Q-Learning Network in pytorch (not actively maintained)

Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs

Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.

[MedIA2021]MIDeepSeg: Minimally Interactive Segmentation of Unseen Objects from Medical Images Using Deep Learning

MINOS: Multimodal Indoor Simulator

[arXiv'22] Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

Official codebase for Pretrained Transformers as Universal Computation Engines.

Open-AI's DALL-E for large scale training in mesh-tensorflow.