Vision Transformer Segmentation Network

This implementation of ViT in pytorch uses a super simple and straight-forward way of generating an output of the same size as the input by applying the inverse rearrange operation on all the predicted outputs. This enables convolution-free multi-class segmentation.

Most of the code is taken from https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit.py

Default Architecture Parameters:

model = ViTSeg( image_size=112, 
                channels=1,
                patch_size=7, 
                num_classes=1, 
                dim=768, 
                depth=6, 
                heads=12, 
                mlp_dim=2048, 
                learned_pos=False, 
                use_token=False)

image_size: An integer or a tuple defining the size of the input image (some code rewrite would enable any image size to be passed)
channels: An integer defining the umber of channels in the input image
patch_size: An integer or a tuple defining the size of the patches
num_classes: An integer representing the nuber of channels in the ouput
dim: An integer defining the size of the embedding dimension
depth: An integer defining the number of transformer layers
heads: An integer defining the number of heads in the transformer layers
mlp_dim: An integer defining the size of the MLP in the transformer layers
learned_pos: A boolean which, if true, switches from fixed positional encoding to learned positional encodings
use_token: A boolean which, if true, add a CLS token in the input and output

Citation

If you find this repository useful, please consider citing it:

@article{reynaud2021vitseg,
  title={ViTSeg-https://github.com/HReynaud/ViTSeg}, 
  url={https://github.com/HReynaud/ViTSeg},  
  Author={Reynaud, Hadrien}, 
  Year={2021}
}

A simple approach to emable dense segmentation with ViT.

Related tags

Overview

Vision Transformer Segmentation Network

Default Architecture Parameters:

Citation

Owner

HReynaud

Deep Learning Models for Causal Inference

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation"

A Pose Estimator for Dense Reconstruction with the Structured Light Illumination Sensor

Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022)

Pytorch implementation of Depth-conditioned Dynamic Message Propagation forMonocular 3D Object Detection

Naszilla is a Python library for neural architecture search (NAS)

《Truly shift-invariant convolutional neural networks》(2021)

《Lerning n Intrinsic Grment Spce for Interctive Authoring of Grment Animtion》

PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility

Code for "ATISS: Autoregressive Transformers for Indoor Scene Synthesis", NeurIPS 2021

Repository for the paper "PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation", CVPR 2021.

PyTorch - Python + Nim

AI virtual gym is an AI program which can be used to exercise and can be used to see if we are doing the exercises

ROMP: Monocular, One-stage, Regression of Multiple 3D People, ICCV21

This code is 3d-CNN model that can predict environmental value

PyTorch implementations of the NeRF model described in "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis"

Learning Skeletal Articulations with Neural Blend Shapes

This repository contains several jupyter notebooks to help users learn to use neon, our deep learning framework

The code uses SegFormer for Semantic Segmentation on Drone Dataset.