Hard cater examples from Hopper ICLR paper

Related tags

Deep Learningcater-h
Overview

CATER-h NEC Laboratories America, Inc.

Honglu Zhou*, Asim Kadav, Farley Lai, Alexandru Niculescu-Mizil, Martin Renqiang Min, Mubbasir Kapadia, Hans Peter Graf

(*Contact: [email protected])

CATER-h is the dataset proposed for the Video Reasoning task, specifically, the problem of Object Permanence, investigated in Hopper: Multi-hop Transformer for Spatiotemporal Reasoning accepted to ICLR 2021. Please refer to our full paper for detailed analysis and evaluations.

1. Overview

This repository provides the CATER-h dataset used in the paper "Hopper: Multi-hop Transformer for Spatiotemporal Reasoning", as well as instructions/code to create the CATER-h dataset.

If you find the dataset or the code helpful, please cite:

Honglu Zhou, Asim Kadav, Farley Lai, Alexandru Niculescu-Mizil, Martin Renqiang Min, Mubbasir Kapadia, Hans Peter Graf. Hopper: Multi-hop Transformer for Spatiotemporal Reasoning. In International Conference on Learning Representations (ICLR), 2021.

@inproceedings{zhou2021caterh,
    title = {{Hopper: Multi-hop Transformer for Spatiotemporal Reasoning}},
    author = {Zhou, Honglu and Kadav, Asim and Lai, Farley and Niculescu-Mizil, Alexandru and Min, Martin Renqiang and Kapadia, Mubbasir and Graf, Hans Peter},
    booktitle = {ICLR},
    year = 2021
}  

2. Dataset

A pre-generated sample of the dataset used in the paper is provided here. If you'd like to generate a version of the dataset, please follow instructions in the following.

3. Requirements

  1. All CLEVR requirements (eg, Blender: the code was used with v2.79b).
  2. This code was used on Linux machines.
  3. GPU: This code was tested with multiple types of GPUs and should be compatible with most GPUs. By default it will use all the GPUs on the machine.
  4. All DETR requirements. You can check the site-packages of our conda environment (Python3.7.6) used.

4. Generating CATER-h

4.1 Generating videos and labels

(We modify code provided by CATER.)

  1. cd generate/

  2. echo $PWD >> blender-2.79b-linux-glibc219-x86_64/2.79/python/lib/python3.5/site-packages/clevr.pth (You can download our blender-2.79b-linux-glibc219-x86_64.)

  3. Run time python launch.py to start generating. Please read through the script to change any settings, paths etc. The command line options should also be easy to follow from the script (e.g., --num_images specifies the number of videos to generate).

  4. time python gen_train_test.py to generate labels for the dataset for each of the tasks. Change the parameters on the top of the file, and run it.

4.2 Obtaining frame and object features

You can find our extracted frame and object features here. The CNN backbone we utilized to obtain the frame features is a pre-trained ResNeXt-101 model. We use DETR trained on the LA-CATER dataset to obtain object features.

4.3 Filtering data by the frame index of the last visible snitch

  1. cd extract/

  2. Download our pretrained object detector from here. Create a folder checkpoints. Put the pretrained object detector into the folder checkpoints.

  3. Change paths etc in extract/configs/CATER-h.yml

  4. time ./run.sh

This will generate an output folder with pickle files that save the frame index of the last visible snitch and the detector's confidence.

  1. Run resample.ipynb which will resample the data to have balanced train/val set in terms of the class label and the frame index of the last visible snitch.

Acknowledgments

The code in this repository is heavily based on the following publically available implementations:

Owner
NECLA ML Group
NEC Labs America, Machine Learning Group
NECLA ML Group
Reading Group @mila-iqia on Computational Optimal Transport for Machine Learning Applications

Computational Optimal Transport for Machine Learning Reading Group Over the last few years, optimal transport (OT) has quickly become a central topic

Ali Harakeh 11 Aug 26, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

405 Jan 06, 2023
Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

GANInversion_with_ConsecutiveImgs Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images" https://a

QingyangXu 38 Dec 07, 2022
The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

DS3L This is the code for paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020. Setups The code is implem

Guolz 36 Oct 19, 2022
Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on PaddlePaddle

DOC | Quick Start | 中文 Breaking News !! 🔥 🔥 🔥 OGB-LSC KDD CUP 2021 winners announced!! (2021.06.17) Super excited to announce our PGL team won TWO

1.5k Jan 06, 2023
利用python脚本实现微信、支付宝账单的合并,并保存到excel文件实现自动记账,可查看可视化图表。

KeepAccounts_v2.0 KeepAccounts.exe和其配套表格能够实现微信、支付宝官方导出账单的读取合并,为每笔帐标记类型,并按月份和类型生成可视化图表。再也不用消费一笔记一笔,每月仅需10分钟,记好所有的帐。 作者: MickLife Bilibili: https://spac

159 Jan 01, 2023
Gems & Holiday Package Prediction

Predictive_Modelling Gems & Holiday Package Prediction This project is based on 2 cases studies : Gems Price Prediction and Holiday Package prediction

Avnika Mehta 1 Jan 27, 2022
torchbearer: A model fitting library for PyTorch

Note: We're moving to PyTorch Lightning! Read about the move here. From the end of February, torchbearer will no longer be actively maintained. We'll

632 Dec 13, 2022
Code for the paper "SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness" (NeurIPS 2021)

SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness (NeurIPS2021) This repository contains code for the paper "Smo

Jongheon Jeong 17 Dec 27, 2022
Multi-agent reinforcement learning algorithm and environment

Multi-agent reinforcement learning algorithm and environment [en/cn] Pytorch implements multi-agent reinforcement learning algorithms including IQL, Q

万鲲鹏 7 Sep 20, 2022
Online-compatible Unsupervised Non-resonant Anomaly Detection Repository

Online-compatible Unsupervised Non-resonant Anomaly Detection Repository Repository containing all scripts used in the studies of Online-compatible Un

0 Nov 09, 2021
pytorch, hand(object) detect ,yolo v5,手检测

YOLO V5 物体检测,包括手部检测。 项目介绍 手部检测 手部检测示例如下 : 视频示例: 项目配置 作者开发环境: Python 3.7 PyTorch = 1.5.1 数据集 手部检测数据集 该项目数据集采用 TV-Hand 和 COCO-Hand (COCO-Hand-Big 部分) 进

Eric.Lee 11 Dec 20, 2022
R interface to fast.ai

R interface to fastai The fastai package provides R wrappers to fastai. The fastai library simplifies training fast and accurate neural nets using mod

113 Dec 20, 2022
Performant, differentiable reinforcement learning

deluca Performant, differentiable reinforcement learning Notes This is pre-alpha software and is undergoing a number of core changes. Updates to follo

Google 114 Dec 27, 2022
Codes for “A Deeply Supervised Attention Metric-Based Network and an Open Aerial Image Dataset for Remote Sensing Change Detection”

DSAMNet The pytorch implementation for "A Deeply-supervised Attention Metric-based Network and an Open Aerial Image Dataset for Remote Sensing Change

Mengxi Liu 41 Dec 14, 2022
v objective diffusion inference code for JAX.

v-diffusion-jax v objective diffusion inference code for JAX, by Katherine Crowson (@RiversHaveWings) and Chainbreakers AI (@jd_pressman). The models

Katherine Crowson 186 Dec 21, 2022
Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows.

Swin-Transformer Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows. For more details, ple

旷视天元 MegEngine 9 Mar 14, 2022
Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling"

Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling" Pipeline of Tip-Adapter Tip-Adapter can provid

peng gao 187 Dec 28, 2022
Neural Re-rendering for Full-frame Video Stabilization

NeRViS: Neural Re-rendering for Full-frame Video Stabilization Project Page | Video | Paper | Google Colab Setup Setup environment for [Yu and Ramamoo

Yu-Lun Liu 9 Jun 17, 2022
Similarity-based Gray-box Adversarial Attack Against Deep Face Recognition

Similarity-based Gray-box Adversarial Attack Against Deep Face Recognition Introduction Run attack: SGADV.py Objective function: foolbox/attacks/gradi

1 Jul 18, 2022