Implementation of "With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition, BMVC, 2021" in PyTorch

Overview

Multimodal Temporal Context Network (MTCN)

This repository implements the model proposed in the paper:

Evangelos Kazakos, Jaesung Huh, Arsha Nagrani, Andrew Zisserman, Dima Damen, With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition, BMVC, 2021

Project webpage

arXiv paper

Citing

When using this code, kindly reference:

@INPROCEEDINGS{kazakos2021MTCN,
  author={Kazakos, Evangelos and Huh, Jaesung and Nagrani, Arsha and Zisserman, Andrew and Damen, Dima},
  booktitle={British Machine Vision Conference (BMVC)},
  title={With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition},
  year={2021}}

NOTE

Although we train MTCN using visual SlowFast features extracted from a model trained with video clips of 2s, at Table 3 of our paper and Table 1 of Appendix (Table 6 in the arXiv version) where we compare MTCN with SOTA, the results of SlowFast are from [1] where the model is trained with video clips of 1s. In the following table, we provide the results of SlowFast trained with 2s, for a direct comparison as we use this model to extract the visual features.

alt text

Requirements

Project's requirements can be installed in a separate conda environment by running the following command in your terminal: $ conda env create -f environment.yml.

Features

The extracted features for each dataset can be downloaded using the following links:

EPIC-KITCHENS-100:

EGTEA:

Pretrained models

We provide pretrained models for EPIC-KITCHENS-100:

  • Audio-visual transformer link
  • Language model link

Ground-truth

Train

EPIC-KITCHENS-100

To train the audio-visual transformer on EPIC-KITCHENS-100, run:

python train_av.py --dataset epic-100 --train_hdf5_path /path/to/epic-kitchens-100/features/audiovisual_slowfast_features_train.hdf5 
--val_hdf5_path /path/to/epic-kitchens-100/features/audiovisual_slowfast_features_val.hdf5 
--train_pickle /path/to/epic-kitchens-100-annotations/EPIC_100_train.pkl 
--val_pickle /path/to/epic-kitchens-100-annotations/EPIC_100_validation.pkl 
--batch-size 32 --lr 0.005 --optimizer sgd --epochs 100 --lr_steps 50 75 --output_dir /path/to/output_dir 
--num_layers 4 -j 8 --classification_mode all --seq_len 9

To train the language model on EPIC-KITCHENS-100, run:

python train_lm.py --dataset epic-100 --train_pickle /path/to/epic-kitchens-100-annotations/EPIC_100_train.pkl 
--val_pickle /path/to/epic-kitchens-100-annotations/EPIC_100_validation.pkl 
--verb_csv /path/to/epic-kitchens-100-annotations/EPIC_100_verb_classes.csv
--noun_csv /path/to/epic-kitchens-100-annotations/EPIC_100_noun_classes.csv
--batch-size 64 --lr 0.001 --optimizer adam --epochs 100 --lr_steps 50 75 --output_dir /path/to/output_dir 
--num_layers 4 -j 8 --num_gram 9 --dropout 0.1

EGTEA

To train the visual-only transformer on EGTEA (EGTEA does not have audio), run:

python train_av.py --dataset egtea --train_hdf5_path /path/to/egtea/features/visual_slowfast_features_train_split1.hdf5
--val_hdf5_path /path/to/egtea/features/visual_slowfast_features_test_split1.hdf5
--train_pickle /path/to/EGTEA_annotations/train_split1.pkl --val_pickle /path/to/EGTEA_annotations/test_split1.pkl 
--batch-size 32 --lr 0.001 --optimizer sgd --epochs 50 --lr_steps 25 38 --output_dir /path/to/output_dir 
--num_layers 4 -j 8 --classification_mode all --seq_len 9

To train the language model on EGTEA,

python train_lm.py --dataset egtea --train_pickle /path/to/EGTEA_annotations/train_split1.pkl
--val_pickle /path/to/EGTEA_annotations/test_split1.pkl 
--action_csv /path/to/EGTEA_annotations/actions_egtea.csv
--batch-size 64 --lr 0.001 --optimizer adam --epochs 50 --lr_steps 25 38 --output_dir /path/to/output_dir 
--num_layers 4 -j 8 --num_gram 9 --dropout 0.1

Test

EPIC-KITCHENS-100

To test the audio-visual transformer on EPIC-KITCHENS-100, run:

python test_av.py --dataset epic-100 --test_hdf5_path /path/to/epic-kitchens-100/features/audiovisual_slowfast_features_val.hdf5
--test_pickle /path/to/epic-kitchens-100-annotations/EPIC_100_validation.pkl
--checkpoint /path/to/av_model/av_checkpoint.pyth --seq_len 9 --num_layers 4 --output_dir /path/to/output_dir
--split validation

To obtain scores of the model on the test set, simply use --test_hdf5_path /path/to/epic-kitchens-100/features/audiovisual_slowfast_features_test.hdf5, --test_pickle /path/to/epic-kitchens-100-annotations/EPIC_100_test_timestamps.pkl and --split test instead. Since the labels for the test set are not available the script will simply save the scores without computing the accuracy of the model.

To evaluate your model on the validation set, follow the instructions in this link. In the same link, you can find instructions for preparing the scores of the model for submission in the evaluation server and obtain results on the test set.

Finally, to filter out improbable sequences using LM, run:

python test_av_lm.py --dataset epic-100
--test_pickle /path/to/epic-kitchens-100-annotations/EPIC_100_validation.pkl 
--test_scores /path/to/audio-visual-results.pkl
--checkpoint /path/to/lm_model/lm_checkpoint.pyth
--num_gram 9 --split validation

Note that, --test_scores /path/to/audio-visual-results.pkl are the scores predicted from the audio-visual transformer. To obtain scores on the test set, use --test_pickle /path/to/epic-kitchens-100-annotations/EPIC_100_test_timestamps.pkl and --split test instead.

Since we are providing the trained models for EPIC-KITCHENS-100, av_checkpoint.pyth and lm_checkpoint.pyth in the test scripts above could be either the provided pretrained models or model_best.pyth that is the your own trained model.

EGTEA

To test the visual-only transformer on EGTEA, run:

python test_av.py --dataset egtea --test_hdf5_path /path/to/egtea/features/visual_slowfast_features_test_split1.hdf5
--test_pickle /path/to/EGTEA_annotations/test_split1.pkl
--checkpoint /path/to/v_model/model_best.pyth --seq_len 9 --num_layers 4 --output_dir /path/to/output_dir
--split test_split1

To filter out improbable sequences using LM, run:

python test_av_lm.py --dataset egtea
--test_pickle /path/to/EGTEA_annotations/test_split1.pkl 
--test_scores /path/to/visual-results.pkl
--checkpoint /path/to/lm_model/model_best.pyth
--num_gram 9 --split test_split1

In each case, you can extract attention weights by simply including --extract_attn_weights at the input arguments of the test script.

References

[1] Dima Damen, Hazel Doughty, Giovanni Maria Farinella, , Antonino Furnari, Jian Ma,Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, andMichael Wray, Rescaling Egocentric Vision: Collection Pipeline and Challenges for EPIC-KITCHENS-100, IJCV, 2021

License

The code is published under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, found here.

Owner
Evangelos Kazakos
Evangelos Kazakos
Official PyTorch implementation of "Adversarial Reciprocal Points Learning for Open Set Recognition"

Adversarial Reciprocal Points Learning for Open Set Recognition Official PyTorch implementation of "Adversarial Reciprocal Points Learning for Open Se

Guangyao Chen 78 Dec 28, 2022
Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

Gyeongsik Moon 677 Dec 25, 2022
This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

TransUNet This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation Usage

1.4k Jan 04, 2023
A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

gym-mtsim: OpenAI Gym - MetaTrader 5 Simulator MtSim is a simulator for the MetaTrader 5 trading platform alongside an OpenAI Gym environment for rein

Mohammad Amin Haghpanah 184 Dec 31, 2022
VGG16 model-based classification project about brain tumor detection.

Brain-Tumor-Classification-with-MRI VGG16 model-based classification project about brain tumor detection. First, you can check what people are doing o

Atakan Erdoğan 2 Mar 21, 2022
Find the Heart simple Python Game

This is a simple Python game for finding a heart emoji. There is a 3 x 3 matrix in which a heart emoji resides. The location of the heart is randomized and is not revealed. The player must guess the

p.katekomol 1 Jan 24, 2022
UnpNet - Rethinking 3-D LiDAR Point Cloud Segmentation(IEEE TNNLS)

UnpNet Citation Please cite the following paper if you use this repository in your reseach. @article {PMID:34914599, Title = {Rethinking 3-D LiDAR Po

Shijie Li 4 Jul 15, 2022
Newt - a Gaussian process library in JAX.

Newt __ \/_ (' \`\ _\, \ \\/ /`\/\ \\ \ \\

AaltoML 0 Nov 02, 2021
In this project, we create and implement a deep learning library from scratch.

ARA In this project, we create and implement a deep learning library from scratch. Table of Contents Deep Leaning Library Table of Contents About The

22 Aug 23, 2022
How Do Adam and Training Strategies Help BNNs Optimization? In ICML 2021.

AdamBNN This is the pytorch implementation of our paper "How Do Adam and Training Strategies Help BNNs Optimization?", published in ICML 2021. In this

Zechun Liu 47 Sep 20, 2022
Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

Hierarchical Memory Matching Network for Video Object Segmentation Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon Lee, Euntai Kim

Hongje Seong 72 Dec 14, 2022
A simple, fully convolutional model for real-time instance segmentation.

You Only Look At CoefficienTs ██╗ ██╗ ██████╗ ██╗ █████╗ ██████╗████████╗ ╚██╗ ██╔╝██╔═══██╗██║ ██╔══██╗██╔════╝╚══██╔══╝ ╚██

Daniel Bolya 4.6k Dec 30, 2022
This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

Patches Are All You Need? 🤷 This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?". Code ov

ICLR 2022 Author 934 Dec 30, 2022
An example of time series augmentation methods with Keras

Time Series Augmentation This is a collection of time series data augmentation methods and an example use using Keras. News 2020/04/16: Repository Cre

九州大学 ヒューマンインタフェース研究室 229 Jan 02, 2023
A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction.

Graph2SMILES A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction. 1. Environmental setup System requirements Ubuntu:

29 Nov 18, 2022
An interactive DNN Model deployed on web that predicts the chance of heart failure for a patient with an accuracy of 98%

Heart Failure Predictor About A Web UI deployed Dense Neural Network Model Made using Tensorflow that predicts whether the patient is healthy or has c

Adit Ahmedabadi 0 Jan 09, 2022
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2020 Links Doc

Sebastian Raschka 4.2k Jan 02, 2023
CONetV2: Efficient Auto-Channel Size Optimization for CNNs

CONetV2: Efficient Auto-Channel Size Optimization for CNNs Exciting News! CONetV2: Efficient Auto-Channel Size Optimization for CNNs has been accepted

Mahdi S. Hosseini 3 Dec 13, 2021
Chinese clinical named entity recognition using pre-trained BERT model

Chinese clinical named entity recognition (CNER) using pre-trained BERT model Introduction Code for paper Chinese clinical named entity recognition wi

Xiangyang Li 109 Dec 14, 2022
Keras community contributions

keras-contrib : Keras community contributions Keras-contrib is deprecated. Use TensorFlow Addons. The future of Keras-contrib: We're migrating to tens

Keras 1.6k Dec 21, 2022