Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Last update: Dec 21, 2022

Overview

Reading list in Transformer

We are a team from KAUST Vision-CAIR group and focus on the Multi-modal representation learning.

This repo is aimed to collect all the recent popular Transformer paper, codes and learning resources with respect to the domains of Vision Transformer, NLP and multi-modal, etc.

Recent News

CVPR multi-modal papers are collected in here

The code of VisualGPT is open sourced. They can be found here

The code and paper of LeViT is open sourced. They can be found here

The paper MLP-Mixer: An all-MLP Architecture for Vision is availble here

The code and paper of MDTER is open sourced. They can be found here

The code and papper of RelTransformer is open sourced. They can be found here

The code and paper of Twins-SVT is open sourced. They can be found here

Vision Transformer for deepfake detection. They can be found here

The code of VideoGPT is open sourced. They can be found here

The code of CoaT is open sourced. They can be found here

The code of Kaleido-BERT is open sourced. They can be found here

The code of TimeSformer is open sourced. They can be found here

The code of SwinTransformer is open sourced. They can be found here

Topics (paper and code)

Review Paper in multi-modal

Video-language

Tutorials and workshop

Datasets

Multi-modal Datasets

Blogs

Lil's blogs

Tools

PyTorchVideo a deep learning library for video understanding research
horovod a tool for multi-gpu parallel processing
accelerate an easy API for mixed precision and any kind of distributed computing
hyperparameter search: optuna
AI Conference Deadlines

Collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and related datasets

Related tags

Overview

Reading list in Transformer

Recent News

Topics (paper and code)

Tutorials and workshop

Datasets

Blogs

Tools

Owner

Jun Chen

A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.

hySLAM is a hybrid SLAM/SfM system designed for mapping

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

A Python package for faster, safer, and simpler ML processes

The official implementation of A Unified Game-Theoretic Interpretation of Adversarial Robustness.

Fewshot-face-translation-GAN - Generative adversarial networks integrating modules from FUNIT and SPADE for face-swapping.

Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Implementation detail for paper "Multi-level colonoscopy malignant tissue detection with adversarial CAC-UNet"

PyTorch code for 'Efficient Single Image Super-Resolution Using Dual Path Connections with Multiple Scale Learning'

Style-based Neural Drum Synthesis with GAN inversion

Official implementation for “Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior”

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021).

Simple codebase for flexible neural net training

A collection of Reinforcement Learning algorithms from Sutton and Barto's book and other research papers implemented in Python.

Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

This is an example of object detection on Micro bacterium tuberculosis using Mask-RCNN