Implementation of Multistream Transformers in Pytorch

Last update: Jul 26, 2022

Overview

Multistream Transformers

Implementation of Multistream Transformers in Pytorch.

This repository deviates slightly from the paper, where instead of using the skip connection across all streams, it uses attention pooling across all tokens in the same position. This has produced the best results in my experiments with number of streams greater than 2.

Install

$ pip install multistream-transformers

Usage

import torch
from multistream_transformers import MultistreamTransformer

model = MultistreamTransformer(
    num_tokens = 256,         # number of tokens
    dim = 512,                # dimension
    depth = 4,                # depth
    causal = True,            # autoregressive or not
    max_seq_len = 1024,       # maximum sequence length
    num_streams = 2           # number of streams - 1 would make it a regular transformer
)

x = torch.randint(0, 256, (2, 1024))
mask = torch.ones((2, 1024)).bool()

logits = model(x, mask = mask) # (2, 1024, 256)

Citations

@misc{burtsev2021multistream,
    title   = {Multi-Stream Transformers}, 
    author  = {Mikhail Burtsev and Anna Rumshisky},
    year    = {2021},
    eprint  = {2107.10342},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}

You might also like...

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu

77 Dec 27, 2022

PyTorch Implementation of "Light Field Image Super-Resolution with Transformers"

LFT PyTorch implementation of "Light Field Image Super-Resolution with Transformers", arXiv 2021. [pdf]. Contributions: We make the first attempt to a

62 Nov 28, 2022

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

49 Nov 10, 2022

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

🌈 ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

225 Dec 29, 2022

This is the official PyTorch implementation for

Implementation of Multistream Transformers in Pytorch

Related tags

Overview

Multistream Transformers

Install

Usage

Citations

You might also like...

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

PyTorch Implementation of "Light Field Image Super-Resolution with Transformers"

Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".

Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

[CVPR'22] Official PyTorch Implementation of Collaborative Transformers for Grounded Situation Recognition

Explainability for Vision Transformers (in PyTorch)

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Releases(0.0.4)

0.0.4(Jul 31, 2021)

0.0.3(Jul 31, 2021)

0.0.2(Jul 30, 2021)

0.0.1(Jul 30, 2021)

Owner

Phil Wang

Evaluation toolkit of the informative tracking benchmark comprising 9 scenarios, 180 diverse videos, and new challenges.

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

TransGAN: Two Transformers Can Make One Strong GAN

Sequence-to-Sequence learning using PyTorch

https://sites.google.com/cornell.edu/recsys2021tutorial

Exporter for Storage Area Network (SAN)

Semi-supervised Implicit Scene Completion from Sparse LiDAR

Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022

Air Pollution Prediction System using Linear Regression and ANN

This implements the learning and inference/proposal algorithm described in "Learning to Propose Objects, Krähenbühl and Koltun"

YOLOX + ROS(1, 2) object detection package

Chinese clinical named entity recognition using pre-trained BERT model

TextureGAN in Pytorch

🛠 All-in-one web-based IDE specialized for machine learning and data science.

This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Official implementation of "Articulation Aware Canonical Surface Mapping"

Algorithmic Trading using RNN

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

BERTMap: A BERT-Based Ontology Alignment System