A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Last update: Sep 20, 2022

Related tags

Overview

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

Source: Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize

A PyTorch implementation of TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? [1-2]. Unlike another Unofficial PyTorch implementation [3], our version is heavily borrowed from the official implementation [4] and TensorFlow implementation[5], and try to keep consistent with them.

Usage

You can access the TokenLearner and TokenLearnerModuleV11 class from the tokenlearner file. You can use this layer with a Vision Transformer, MLPMixer, or Video Vision Transformer as done in the paper.

import torch
from tokenlearner import TokenLearner

tklr = TokenLearner(in_channels=128, num_tokens=8, use_sum_pooling=False)

x = torch.ones(256, 32, 32, 128)  # [bs, h, w, c]
y1 = tklr(x)
print(y1.shape)  # [256, 8, 128]

You can also use TokenLearnerModuleV11, which aligns with the official implementation.

import torch
from tokenlearner import TokenLearnerModuleV11

tklr_v11 = TokenLearnerModuleV11(in_channels=128, num_tokens=8, num_groups=4, dropout_rate=0.)

tklr_v11.eval()  # control droput
x = torch.ones(256, 32, 32, 128)   # [bs, h, w, c]
y2 = tklr_v11(x)
print(y2.shape)  # [256, 8, 128]

References

[1] TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?; Ryoo et al.; arXiv 2021; https://arxiv.org/abs/2106.11297

[2] TokenLearner: Adaptive Space-Time Tokenization for Videos; Ryoo et al., NeurIPS 2021; https://openreview.net/forum?id=z-l1kpDXs88

[3] Unofficial PyTorch implementation

[4] official implementation

[5] TensorFlow implementation

A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

Related tags

Overview

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

Usage

References

Owner

Caiyong Wang

Rotated Box Is Back : Accurate Box Proposal Network for Scene Text Detection

Official repository for Hierarchical Opacity Propagation for Image Matting

Weakly Supervised Scene Text Detection using Deep Reinforcement Learning

A toolkit for developing and comparing reinforcement learning algorithms.

Accommodating supervised learning algorithms for the historical prices of the world's favorite cryptocurrency and boosting it through LightGBM.

GPOEO is a micro-intrusive GPU online energy optimization framework for iterative applications

🔮 A refreshing functional take on deep learning, compatible with your favorite libraries

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

MINOS: Multimodal Indoor Simulator

This package contains deep learning models and related scripts for RoseTTAFold

A general, feasible, and extensible framework for classification tasks.

This repository contains code to train and render Mixture of Volumetric Primitives (MVP) models

Incremental Cross-Domain Adaptation for Robust Retinopathy Screening via Bayesian Deep Learning

A robust camera and Lidar fusion based velocity estimator to undistort the pointcloud.

MOpt-AFL provided by the paper "MOPT: Optimized Mutation Scheduling for Fuzzers"

Learning Skeletal Articulations with Neural Blend Shapes

Python SDK for building, training, and deploying ML models

Explainable Medical ImageSegmentation via GenerativeAdversarial Networks andLayer-wise Relevance Propagation

Camera ready code repo for the NeuRIPS 2021 paper: "Impression learning: Online representation learning with synaptic plasticity".