Clockwork Variational Autoencoders (CW-VAE)

Vaibhav Saxena, Jimmy Ba, Danijar Hafner

If you find this code useful, please reference in your paper:

@article{saxena2021clockworkvae,
  title={Clockwork Variational Autoencoders}, 
  author={Saxena, Vaibhav and Ba, Jimmy and Hafner, Danijar},
  journal={arXiv preprint arXiv:2102.09532},
  year={2021},
}

Method

Clockwork VAEs are deep generative model that learn long-term dependencies in video by leveraging hierarchies of representations that progress at different clock speeds. In contrast to prior video prediction methods that typically focus on predicting sharp but short sequences in the future, Clockwork VAEs can accurately predict high-level content, such as object positions and identities, for 1000 frames.

Clockwork VAEs build upon the Recurrent State Space Model (RSSM), so each state contains a deterministic component for long-term memory and a stochastic component for sampling diverse plausible futures. Clockwork VAEs are trained end-to-end to optimize the evidence lower bound (ELBO) that consists of a reconstruction term for each image and a KL regularizer for each stochastic variable in the model.

More information:

Instructions

This repository contains the code for training the Clockwork VAE model on the datasets minerl, mazes, and mmnist.

The datasets will automatically be downloaded into the --datadir directory.

python3 train.py --logdir /path/to/logdir --datadir /path/to/datasets --config configs/<dataset>.yml

The evaluation script writes open-loop video predictions in both PNG and NPZ format and plots of PSNR and SSIM to the data directory.

python3 eval.py --logdir /path/to/logdir

Clockwork Variational Autoencoder

Related tags

Overview

Clockwork Variational Autoencoders (CW-VAE)

Method

Instructions

Owner

Vaibhav Saxena

[LREC] MMChat: Multi-Modal Chat Dataset on Social Media

Codes for [NeurIPS'21] You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership.

RE3: State Entropy Maximization with Random Encoders for Efficient Exploration

[CVPR 2022] Structured Sparse R-CNN for Direct Scene Graph Generation

Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

DecoupledNet is semantic segmentation system which using heterogeneous annotations

Implementation for Curriculum DeepSDF

TF Image Segmentation: Image Segmentation framework

🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

Classification Modeling: Probability of Default

Image super-resolution (SR) is a fast-moving field with novel architectures attracting the spotlight

Non-Homogeneous Poisson Process Intensity Modeling and Estimation using Measure Transport

A decent AI that solves daily Wordle puzzles. Works with different websites with similar wordlists,.

Repository for RNNs using TensorFlow and Keras - LSTM and GRU Implementation from Scratch - Simple Classification and Regression Problem using RNNs

FCA: Learning a 3D Full-coverage Vehicle Camouflage for Multi-view Physical Adversarial Attack

Benchmarks for the Optimal Power Flow Problem

General-purpose program synthesiser

Populating 3D Scenes by Learning Human-Scene Interaction https://posa.is.tue.mpg.de/