Official implementation of "Refiner: Refining Self-attention for Vision Transformers".

Overview

RefinerViT

This repo is the official implementation of "Refiner: Refining Self-attention for Vision Transformers". The repo is build on top of timm and include the relabbeling trick included in TokenLabelling.

Introduction

Refined Vision Transformer is initially described in arxiv, which observes vision transformers require much more datafor model pre-training. Most of recent works thus are dedicated to designing morecomplex architectures or training methods to address the data-efficiency issue ofViTs. However, few of them explore improving the self-attention mechanism, akey factor distinguishing ViTs from CNNs. Different from existing works, weintroduce a conceptually simple scheme, calledrefiner, to directly refine the self-attention maps of ViTs. Specifically, refiner exploresattention expansionthatprojects the multi-head attention maps to a higher-dimensional space to promotetheir diversity. Further, refiner applies convolutions to augment local patternsof the attention maps, which we show is equivalent to adistributed local atten-tion—features are aggregated locally with learnable kernels and then globallyaggregated with self-attention. Extensive experiments demonstrate that refinerworks surprisingly well. Significantly, it enables ViTs to achieve 86% top-1 classifi-cation accuracy on ImageNet with only 81M parameters.

Please run git clone with --recursive to clone timm as submodule and install it with cd pytorch-image-models && pip install -e ./

Requirements

torch>=1.4.0 torchvision>=0.5.0 pyyaml numpy timm==0.4.5

A summary of the results are shown below for quick reference. Details can be found in the paper.

Model head layer dim Image resolution Param Top 1
Refiner-ViT-S 12 16 384 224 25M 83.6
Refiner-ViT-S 12 16 384 384 25M 84.6
Refiner-ViT-M 12 32 420 224 55M 84.6
Refiner-ViT-M 12 32 420 384 55M 85.6
Refiner-ViT-L 16 32 512 224 81M 84.9
Refiner-ViT-L 16 32 512 384 81M 85.8
Refiner-ViT-L 16 32 512 448 81M 86.0

Training

Train the Refiner-ViT-S from scratch:

bash run.sh scripts/refiner_s.yaml 

To use the re-labbeling tricks for improving the accuracy, download the relabel_data based on NFNet. This is provided in TokenLabelling repo. Then, copy the relabbeling data to the data folder.

An excellent hash algorithm combining classical sponge structure and RNN.

SHA-RNN Recurrent Neural Network with Chaotic System for Hash Functions Anonymous Authors [摘要] 在这次作业中我们提出了一种新的 Hash Function —— SHA-RNN。其以海绵结构为基础,融合了混

Houde Qian 5 May 15, 2022
DeepMoCap: Deep Optical Motion Capture using multiple Depth Sensors and Retro-reflectors

DeepMoCap: Deep Optical Motion Capture using multiple Depth Sensors and Retro-reflectors By Anargyros Chatzitofis, Dimitris Zarpalas, Stefanos Kollias

tofis 24 Oct 08, 2022
Explicable Reward Design for Reinforcement Learning Agents [NeurIPS'21]

Explicable Reward Design for Reinforcement Learning Agents [NeurIPS'21]

3 May 12, 2022
Code to compute permutation and drop-column importances in Python scikit-learn models

Feature importances for scikit-learn machine learning models By Terence Parr and Kerem Turgutlu. See Explained.ai for more stuff. The scikit-learn Ran

Terence Parr 537 Dec 31, 2022
MAg: a simple learning-based patient-level aggregation method for detecting microsatellite instability from whole-slide images

MAg Paper Abstract File structure Dataset prepare Data description How to use MAg? Why not try the MAg_lib! Trained models Experiment and results Some

Calvin Pang 3 Apr 08, 2022
Learning trajectory representations using self-supervision and programmatic supervision.

Trajectory Embedding for Behavior Analysis (TREBA) Implementation from the paper: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Y

58 Jan 06, 2023
PassAPI is a password generator in hash format and fully developed in Python, with the aim of teaching how to handle and build

simple, elegant and safe Introduction PassAPI is a password generator in hash format and fully developed in Python, with the aim of teaching how to ha

Johnsz 2 Mar 02, 2022
Build fully-functioning computer vision models with PyTorch

Detecto is a Python package that allows you to build fully-functioning computer vision and object detection models with just 5 lines of code. Inferenc

Alan Bi 576 Dec 29, 2022
code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

SHOT++ Code for our TPAMI submission "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer" that is ext

75 Dec 16, 2022
Official implementation of Sparse Transformer-based Action Recognition

STAR Official implementation of S parse T ransformer-based A ction R ecognition Dataset download NTU RGB+D 60 action recognition of 2D/3D skeleton fro

Chonghan_Lee 15 Nov 02, 2022
Deep motion generator collections

GenMotion GenMotion (/gen’motion/) is a Python library for making skeletal animations. It enables easy dataset loading and experiment sharing for synt

23 May 24, 2022
Mall-Customers-Segmentation - Customer Segmentation Using K-Means Clustering

Overview Customer Segmentation is one the most important applications of unsupervised learning. Using clustering techniques, companies can identify th

NelakurthiSudheer 2 Jan 03, 2022
RipsNet: a general architecture for fast and robust estimation of the persistent homology of point clouds

RipsNet: a general architecture for fast and robust estimation of the persistent homology of point clouds This repository contains the code asscoiated

Felix Hensel 14 Dec 12, 2022
RodoSol-ALPR Dataset

RodoSol-ALPR Dataset This dataset, called RodoSol-ALPR dataset, contains 20,000 images captured by static cameras located at pay tolls owned by the Ro

Rayson Laroca 45 Dec 15, 2022
.NET bindings for the Pytorch engine

TorchSharp TorchSharp is a .NET library that provides access to the library that powers PyTorch. It is a work in progress, but already provides a .NET

Matteo Interlandi 17 Aug 30, 2021
This is an official repository of CLGo: Learning to Predict 3D Lane Shape and Camera Pose from a Single Image via Geometry Constraints

CLGo This is an official repository of CLGo: Learning to Predict 3D Lane Shape and Camera Pose from a Single Image via Geometry Constraints An earlier

刘芮金 32 Dec 20, 2022
Official Pytorch Implementation of Unsupervised Image Denoising with Frequency Domain Knowledge

Unsupervised Image Denoising with Frequency Domain Knowledge (BMVC 2021 Oral) : Official Project Page This repository provides the official PyTorch im

Donggon Jang 12 Sep 26, 2022
Implementation for On Provable Benefits of Depth in Training Graph Convolutional Networks

Implementation for On Provable Benefits of Depth in Training Graph Convolutional Networks Setup This implementation is based on PyTorch = 1.0.0. Smal

Weilin Cong 8 Oct 28, 2022
这是一个yolox-pytorch的源码,可以用于训练自己的模型。

YOLOX:You Only Look Once目标检测模型在Pytorch当中的实现 目录 性能情况 Performance 实现的内容 Achievement 所需环境 Environment 小技巧的设置 TricksSet 文件下载 Download 训练步骤 How2train 预测步骤

Bubbliiiing 613 Jan 05, 2023
Deep learning PyTorch library for time series forecasting, classification, and anomaly detection

Deep learning for time series forecasting Flow forecast is an open-source deep learning for time series forecasting framework. It provides all the lat

AIStream 1.2k Jan 04, 2023