Unsupervised Learning of Video Representations using LSTMs

Last update: Dec 20, 2022

Related tags

Overview

Unsupervised Learning of Video Representations using LSTMs

Code for paper Unsupervised Learning of Video Representations using LSTMs by Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov; ICML 2015.

We use multilayer Long Short Term Memory (LSTM) networks to learn representations of video sequences. The representation can be used to perform different tasks, such as reconstructing the input sequence, predicting the future sequence, or for classification. Examples:

Note that the code at this link is deprecated.

Getting Started

To compile cudamat library you need to modify CUDA_ROOT in cudamat/Makefile to the relevant cuda root path.

The libraries you need to install are:

h5py (HDF5 (>= 1.8.11))
google.protobuf (Protocol Buffers (>= 2.5.0))
numpy
matplotlib

Next compile .proto file by calling

protoc -I=./ --python_out=./ config.proto

Depending on the task, you would need to download the following dataset files. These can be obtained by running:

wget http://www.cs.toronto.edu/~emansim/datasets/mnist.h5
wget http://www.cs.toronto.edu/~emansim/datasets/bouncing_mnist_test.npy
wget http://www.cs.toronto.edu/~emansim/datasets/ucf101_sample_train_patches.npy
wget http://www.cs.toronto.edu/~emansim/datasets/ucf101_sample_valid_patches.npy
wget http://www.cs.toronto.edu/~emansim/datasets/ucf101_sample_train_features.h5
wget http://www.cs.toronto.edu/~emansim/datasets/ucf101_sample_train_labels.txt
wget http://www.cs.toronto.edu/~emansim/datasets/ucf101_sample_train_num_frames.txt
wget http://www.cs.toronto.edu/~emansim/datasets/ucf101_sample_valid_features.h5
wget http://www.cs.toronto.edu/~emansim/datasets/ucf101_sample_valid_labels.txt
wget http://www.cs.toronto.edu/~emansim/datasets/ucf101_sample_valid_num_frames.txt

Note to Toronto users: You don't need to download any files, as they are available in my gobi3 repository and are already set up.

Bouncing (Moving) MNIST dataset

To train a sample model on this dataset you need to set correct data_file in datasets/bouncing_mnist_valid.pbtxt and then run (you may need to change the board id of gpu):

python lstm_combo.py models/lstm_combo_1layer_mnist.pbtxt datasets/bouncing_mnist.pbtxt datasets/bouncing_mnist_valid.pbtxt 1

After training the model and setting correct path to trained weights in models/lstm_combo_1layer_mnist_pretrained.pbtxt, you can visualize the sample reconstruction and future prediction results of the pretrained model by running:

python display_results.py models/lstm_combo_1layer_mnist_pretrained.pbtxt datasets/bouncing_mnist_valid.pbtxt 1

Below are the sample results, where first image is reference image and second image is prediction of the model. Note that first ten frames are reconstructions, whereas the last ten frames are future predictions.

Video patches

Due to the size constraints, I only managed to upload a small sample dataset of UCF-101 patches. The trained model is overfitting, so this example is just meant for instructional purposes. The setup is the same as in Bouncing MNIST dataset.

To train the model run:

python lstm_combo.py models/lstm_combo_1layer_ucf101_patches.pbtxt datasets/ucf101_patches.pbtxt datasets/ucf101_patches_valid.pbtxt 1

To see the results run:

python display_results.py models/lstm_combo_1layer_ucf101_pretrained.pbtxt datasets/ucf101_patches_valid.pbtxt 1

Classification using high level representations ('percepts') of video frames

Again, as in the case of UCF-101 patches, I was able to upload a very small subset of fc6 features of video frames extracted using VGG network. To train the classifier run:

python lstm_classifier.py models/lstm_classifier_1layer_ucf101_features.pbtxt datasets/ucf101_features.pbtxt datasets/ucf101_features_valid.pbtxt 1

Reference

If you found this code or our paper useful, please consider citing the following paper:

@inproceedings{srivastava15_unsup_video,
  author    = {Nitish Srivastava and Elman Mansimov and Ruslan Salakhutdinov},
  title     = {Unsupervised Learning of Video Representations using {LSTM}s},
  booktitle = {ICML},
  year      = {2015}
}

Unsupervised Learning of Video Representations using LSTMs

Related tags

Overview

Unsupervised Learning of Video Representations using LSTMs

Getting Started

Bouncing (Moving) MNIST dataset

Video patches

Classification using high level representations ('percepts') of video frames

Reference

Owner

Elman Mansimov

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

A geometric deep learning pipeline for predicting protein interface contacts.

This is the 3D Implementation of 《Inconsistency-aware Uncertainty Estimation for Semi-supervised Medical Image Segmentation》

A project to make Amazon Echo respond to sign language using your webcam

Data-Uncertainty Guided Multi-Phase Learning for Semi-supervised Object Detection

Part-Aware Data Augmentation for 3D Object Detection in Point Cloud

Pywonderland - A tour in the wonderland of math with python.

TextureGAN in Pytorch

Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

A MNIST-like fashion product database. Benchmark

Repo for CVPR2021 paper "QPIC: Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information"

Code for generating the figures in the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?"

PyTorch implementation of Glow

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

UltraGCN: An Ultra Simplification of Graph Convolutional Networks for Recommendation

Scalable Optical Flow-based Image Montaging and Alignment

DeceFL: A Principled Decentralized Federated Learning Framework

Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Useful materials and tutorials for 110-1 NTU DBME5028 (Application of Deep Learning in Medical Imaging)

Official PyTorch Implementation of Rank & Sort Loss [ICCV2021]