Scaling Vision with Sparse Mixture of Experts

This repository contains the code for training and fine-tuning Sparse MoE models for vision (V-MoE) on ImageNet-21k, reproducing the results presented in the paper:

Scaling Vision with Sparse Mixture of Experts, by Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, and Neil Houlsby.

We will soon provide a colab analysing one of the models that we have released, as well as "config" files to train from scratch and fine-tune checkpoints. Stay tuned.

Installation

Simply clone this repository.

The file requirements.txt contains the requirements that can be installed via PyPi. However, we recommend installing jax, flax and optax directly from GitHub, since we use some of the latest features that are not part of any release yet.

In addition, you also have to clone the Vision Transformer repository, since we use some parts of it.

If you want to use RandAugment to train models (which we recommend if you train on ImageNet-21k or ILSVRC2012 from scratch), you must also clone the Cloud TPU repository, and name it cloud_tpu.

Checkpoints

We release the checkpoints containing the weights of some models that we trained on ImageNet (either ILSVRC2012 or ImageNet-21k). All checkpoints contain an index file (with .index extension) and one or multiple data files ( with extension .data-nnnnn-of-NNNNN, called shards). In the following list, we indicate only the prefix of each checkpoint. We recommend using gsutil to obtain the full list of files, download them, etc.

V-MoE S/32, 8 experts on the last two odd blocks, trained from scratch on ILSVRC2012 with RandAugment: gs://vmoe_checkpoints/vmoe_s32_last2_ilsvrc2012_randaug_medium.
V-MoE B/16, 8 experts on every odd block, trained from scratch on ImageNet-21k with RandAugment: gs://vmoe_checkpoints/vmoe_b16_imagenet21k_randaug_strong.
- Fine-tuned on ILSVRC2012: gs://vmoe_checkpoints/vmoe_b16_imagenet21k_randaug_strong_ft_ilsvrc2012

Disclaimers

This is not an officially supported Google product.

Scaling Vision with Sparse Mixture of Experts

Related tags

Overview

Scaling Vision with Sparse Mixture of Experts

Installation

Checkpoints

Disclaimers

Owner

Google Research

LightningFSL: Pytorch-Lightning implementations of Few-Shot Learning models.

A cross-document event and entity coreference resolution system, trained and evaluated on the ECB+ corpus.

Repository to run object detection on a model trained on an autonomous driving dataset.

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Tree-based Search Graph for Approximate Nearest Neighbor Search

VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

This is a Python wrapper for TA-LIB based on Cython instead of SWIG.

Repo for our ICML21 paper Unsupervised Learning of Visual 3D Keypoints for Control

Source code for the plant extraction workflow introduced in the paper “Agricultural Plant Cataloging and Establishment of a Data Framework from UAV-based Crop Images by Computer Vision”

Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift (ICCV 2021)

Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

a reimplementation of LiteFlowNet in PyTorch that matches the official Caffe version

RRL: Resnet as representation for Reinforcement Learning

We provided a matlab implementation for an evolutionary multitasking AUC optimization framework (EMTAUC).

[Open Source]. The improved version of AnimeGAN. Landscape photos/videos to anime

TensorFlow, PyTorch and Numpy layers for generating Orthogonal Polynomials

[CVPR 2021] Unsupervised Degradation Representation Learning for Blind Super-Resolution

Trainable Bilateral Filter Layer (PyTorch)

A general python framework for visual object tracking and video object segmentation, based on PyTorch