RVT: Robust Vision Transformers

This repository contains PyTorch code for Robust Vision Transformers.

For details see Rethinking the Design Principles of Robust Vision Transformer by Xiaofeng Mao, Gege Qi, Yuefeng Chen, Yuan He and Hui Xue.

Usage

First, clone the repository locally:

git clone https://github.com/vtddggg/Robust-Vision-Transformer.git

Then, install PyTorch 1.7.0+ and torchvision 0.8.1+ and pytorch-image-models 0.3.2:

conda install -c pytorch pytorch torchvision
pip install timm==0.3.2

We use 4 nodes with 8 gpus to train RVT-Ti, RVT-S and RVT-B:

Training RVT-Ti

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=4 main.py --model rvt_tiny --data-path /path/to/imagenet --output_dir output --dist-eval

Training RVT-S

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=4 main.py --model rvt_small --data-path /path/to/imagenet --output_dir output --dist-eval

Training RVT-B

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=4 main.py --model rvt_base --data-path /path/to/imagenet --output_dir output --batch-size 32 --dist-eval

If you want to train RVT-Ti*, RVT-S* or RVT-B*, simply add --use_mask and --use_patch_aug to enable positon-aware attention scaling and patch-wise augmentation.

This repository contains PyTorch code for Robust Vision Transformers.

Related tags

Overview

RVT: Robust Vision Transformers

Usage

Training RVT-Ti

Training RVT-S

Training RVT-B

Owner

Simple-System-Convert--C--F - Simple System Convert With Python

A new test set for ImageNet

Pytorch based library to rank predicted bounding boxes using text/image user's prompts.

🔪 Elimination based Lightweight Neural Net with Pretrained Weights

Low-dose Digital Mammography with Deep Learning

Convert openmmlab (not only mmdetection) series model to tensorrt

Spectrum Surveying: Active Radio Map Estimation with Autonomous UAVs

Fast Neural Representations for Direct Volume Rendering

📚 A collection of Jupyter notebooks for learning and experimenting with OpenVINO 👓

PyTorch Implementation of Region Similarity Representation Learning (ReSim)

Code for ICMI2020 and ICMI2021 papers: "Studying Person-Specific Pointing and Gaze Behavior for Multimodal Referencing of Outside Objects from a Moving Vehicle" and "ML-PersRef: A Machine Learning-based Personalized Multimodal Fusion Approach for Referencing Outside Objects From a Moving Vehicle"

Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

An Artificial Intelligence trying to drive a car by itself on a user created map

PyTorch implementation of Self-supervised Contrastive Regularization for DG (SelfReg)

ADGAN - The Implementation of paper Controllable Person Image Synthesis with Attribute-Decomposed GAN

This repository contains project created during the Data Challenge module at London School of Hygiene & Tropical Medicine

A Light in the Dark: Deep Learning Practices for Industrial Computer Vision

A real-time motion capture system that estimates poses and global translations using only 6 inertial measurement units

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

Streamlit app demonstrating an image browser for the Udacity self-driving-car dataset with realtime object detection using YOLO.