Benchmarking the robustness of Spatial-Temporal Models

Overview

Benchmarking the robustness of Spatial-Temporal Models

This repositery contains the code for the paper Benchmarking the Robustness of Spatial-Temporal Models Against Corruptions.

Python 2.7 and 3.7, Pytorch 1.7+, FFmpeg are required.

Requirements

pip3 install - requirements.txt

Mini Kinetics-C

image info

Download original Kinetics400 from link.

The Mini Kinetics-C contains half of the classes in Kinetics400. All the classes can be found in mini-kinetics-200-classes.txt.

Mini Kinetics-C Leaderboard

Corruption robustness of spatial-temporal models trained on clean Mini Kinetics and evaluated on Mini Kinetics-C.

Approach Reference Backbone Input Length Sampling Method Clean Accuracy mPC rPC
TimeSformer Gedas et al. Transformer 32 Uniform 82.2 71.4 86.9
3D ResNet K. Hara et al. ResNet-50 32 Uniform 73.0 59.2 81.1
I3D J. Carreira et al. InceptionV1 32 Uniform 70.5 57.7 81.8
SlowFast 8x4 C. Feichtenhofer at al. ResNet-50 32 Uniform 69.2 54.3 78.5
3D ResNet K. Hara et al. ResNet-18 32 Uniform 66.2 53.3 80.5
TAM Q.Fan et al. ResNet-50 32 Uniform 66.9 50.8 75.9
X3D-M C. Feichtenhofer ResNet-50 32 Uniform 62.6 48.6 77.6

For fair comparison, it is recommended to submit the result of approach which follows the following settings: Backbone of ResNet-50, Input Length of 32, Uniform Sampling at Clip Level. Any result on our benchmark can be submitted via pull request.

Mini SSV2-C

image info

Download original Something-Something-V2 datset from link.

The Mini SSV2-C contains half of the classes in Something-Something-V2. All the classes can be found in mini-ssv2-87-classes.txt.

Mini SSV2-C Leaderboard

Corruption robustness of spatial-temporal models trained on clean Mini SSV2 and evaluated on Mini SSV2-C.

Approach Reference Backbone Input Length Sampling Method Clean Accuracy mPC rPC
TimeSformer Gedas et al. Transformer 16 Uniform 60.5 49.7 82.1
I3D J. Carreira et al. InceptionV1 32 Uniform 58.5 47.8 81.7
3D ResNet K. Hara et al. ResNet-50 32 Uniform 57.4 46.6 81.2
TAM Q.Fan et al. ResNet-50 32 Uniform 61.8 45.7 73.9
3D ResNet K. Hara et al. ResNet-18 32 Uniform 53.0 42.6 80.3
X3D-M C. Feichtenhofer ResNet-50 32 Uniform 49.9 40.7 81.6
SlowFast 8x4 C. Feichtenhofer at al. ResNet-50 32 Uniform 48.7 38.4 78.8

For fair comparison, it is recommended to submit the result of approach which follows the following settings: Backbone of ResNet-50, Input Length of 32, Uniform Sampling at Clip Level. Any result on our benchmark can be submitted via pull request.

Training and Evaluation

To help researchers reproduce the benchmark results provided in our leaderboard, we include a simple framework for training and evaluating the spatial-temporal models in the folder: benchmark_framework.

Running the code

Assume the structure of data directories is the following:

~/
  datadir/
    mini_kinetics/
      train/
        .../ (directories of class names)
          ...(hdf5 file containing video frames)
    mini_kinetics-c/
      .../ (directories of corruption names)
        .../ (directories of severity level)
          .../ (directories of class names)
            ...(hdf5 file containing video frames)

Train I3D on the Mini Kinetics dataset with 4 GPUs and 16 CPU threads (for data loading). The input lenght is 32, the batch size is 32 and learning rate is 0.01.

python3 train.py --threed_data --dataset mini_kinetics400 --frames_per_group 1 --groups 32 --logdir snapshots/ \
--lr 0.01 --backbone_net i3d -b 32 -j 16 --cuda 0,1,2,3

Test I3D on the Mini Kinetics-C dataset (pretrained model is loaded)

python3 test_corruption.py --threed_data --dataset mini_kinetics400 --frames_per_group 1 --groups 32 --logdir snapshots/ \
--pretrained snapshots/mini_kinetics400-rgb-i3d_v2-ts-max-f32-cosine-bs32-e50-v1/model_best.pth.tar --backbone_net i3d -b 32 -j 16 -e --cuda 0,1,2,3

Owner
Yi Chenyu Ian
Yi Chenyu Ian
Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Auto-ViML Automatically Build Variant Interpretable ML models fast! Auto_ViML is pronounced "auto vimal" (autovimal logo created by Sanket Ghanmare) N

AutoViz and Auto_ViML 397 Dec 30, 2022
Structure-Preserving Deraining with Residue Channel Prior Guidance (ICCV2021)

SPDNet Structure-Preserving Deraining with Residue Channel Prior Guidance (ICCV2021) Requirements Linux Platform NVIDIA GPU + CUDA CuDNN PyTorch == 0.

41 Dec 12, 2022
ML course - EPFL Machine Learning Course, Fall 2021

EPFL Machine Learning Course CS-433 Machine Learning Course, Fall 2021 Repository for all lecture notes, labs and projects - resources, code templates

EPFL Machine Learning and Optimization Laboratory 1k Jan 04, 2023
A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

Xcessiv Xcessiv is a tool to help you create the biggest, craziest, and most excessive stacked ensembles you can think of. Stacked ensembles are simpl

Reiichiro Nakano 1.3k Nov 17, 2022
Real life contra a deep learning project built using mediapipe and openc

real-life-contra Description A python script that translates the body movement into in game control. Welcome to all new real life contra a deep learni

Programminghut 7 Jan 26, 2022
CrossMLP - The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.

CrossMLP Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation Bin Ren1, Hao Tang2, Nicu Sebe1. 1University of Trento, Italy, 2ETH, Switzerla

Bingoren 16 Jul 27, 2022
10x faster matrix and vector operations

Bolt is an algorithm for compressing vectors of real-valued data and running mathematical operations directly on the compressed representations. If yo

2.3k Jan 09, 2023
A web porting for NVlabs' StyleGAN2, to facilitate exploring all kinds characteristic of StyleGAN networks

This project is a web porting for NVlabs' StyleGAN2, to facilitate exploring all kinds characteristic of StyleGAN networks. Thanks for NVlabs' excelle

K.L. 150 Dec 15, 2022
🎃 Core identification module of AI powerful point reading system platform.

ppReader-Kernel Intro Core identification module of AI powerful point reading system platform. Usage 硬件: Windows10、GPU:nvdia GTX 1060 、普通RBG相机 软件: con

CrashKing 1 Jan 11, 2022
Source code for the ACL-IJCNLP 2021 paper entitled "T-DNA: Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation" by Shizhe Diao et al.

T-DNA Source code for the ACL-IJCNLP 2021 paper entitled Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adapta

shizhediao 17 Dec 22, 2022
Steer OpenAI's Jukebox with Music Taggers

TagBox Steer OpenAI's Jukebox with Music Taggers! The closest thing we have to VQGAN+CLIP for music! Unsupervised Source Separation By Steering Pretra

Ethan Manilow 34 Nov 02, 2022
Pyramid addon for OpenAPI3 validation of requests and responses.

Validate Pyramid views against an OpenAPI 3.0 document Peace of Mind The reason this package exists is to give you peace of mind when providing a REST

Pylons Project 79 Dec 30, 2022
Numerai tournament example scripts using NN and optuna

numerai_NN_example Numerai tournament example scripts using pytorch NN, lightGBM and optuna https://numer.ai/tournament Performance of my model based

Takahiro Maeda 12 Oct 10, 2022
This is a model made out of Neural Network specifically a Convolutional Neural Network model

This is a model made out of Neural Network specifically a Convolutional Neural Network model. This was done with a pre-built dataset from the tensorflow and keras packages. There are other alternativ

9 Oct 18, 2022
Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Knover Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers and developers to carry out eff

607 Dec 31, 2022
Mmdet benchmark with python

mmdet_benchmark 本项目是为了研究 mmdet 推断性能瓶颈,并且对其进行优化。 配置与环境 机器配置 CPU:Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz GPU:NVIDIA GeForce RTX 3080 10GB 内存:64G 硬盘:1T

杨培文 (Yang Peiwen) 24 May 21, 2022
[CVPR 2021] Teachers Do More Than Teach: Compressing Image-to-Image Models (CAT)

CAT arXiv Pytorch implementation of our method for compressing image-to-image models. Teachers Do More Than Teach: Compressing Image-to-Image Models Q

Snap Research 160 Dec 09, 2022
Project page for End-to-end Recovery of Human Shape and Pose

End-to-end Recovery of Human Shape and Pose Angjoo Kanazawa, Michael J. Black, David W. Jacobs, Jitendra Malik CVPR 2018 Project Page Requirements Pyt

1.4k Dec 29, 2022
(JMLR' 19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Python Outlier Detection (PyOD) Deployment & Documentation & Stats & License PyOD is a comprehensive and scalable Python toolkit for detecting outlyin

Yue Zhao 6.6k Jan 05, 2023
Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

ProGen - (wip) Implementation and replication of ProGen, Language Modeling for Protein Generation, in Pytorch and Jax (the weights will be made easily

Phil Wang 71 Dec 01, 2022