Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Overview

Big Vision

This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and TensorFlow Datasets for scalable input pipelines in the Cloud.

The open-sourcing of this codebase has two main purposes:

  1. Publishing the code of research projects developed in this codebase (see a list below).
  2. Providing a strong starting point for running large-scale vision experiments on Google Cloud TPUs, which should scale seamlessly and out-of-the box from a single TPU core to a distributed setup with up to 2048 TPU cores.

Note, that despite being TPU-centric, our codebase should in general support CPU, GPU and single-host multi-GPU training, thanks to JAX' well-executed and transparent support for multiple backends.

big_vision aims to support research projects at Google. We are unlikely to work on feature requests or accept external contributions, unless they were pre-approved (ask in an issue first). For a well-supported transfer-only codebase, see also vision_transformer.

The following research projects were originally conducted in the big_vision codebase:

Architecture research

Multimodal research

Knowledge distillation

Misc

  • Are we done with ImageNet?, by Lucas Beyer*, Olivier J. Hénaff*, Alexander Kolesnikov*, Xiaohua Zhai*, and Aäron van den Oord*

Codebase high-level organization and principles in a nutshell

The main entry point is a trainer module, which typically does all the boilerplate related to creating a model and an optimizer, loading the data, checkpointing and training/evaluating the model inside a loop. We provide the canonical trainer train.py in the root folder. Normally, individual projects within big_vision fork and customize this trainer.

All models, evaluators and preprocessing operations live in the corresponding subdirectories and can often be reused between different projects. We encourage compatible APIs within these directories to facilitate reusability, but it is not strictly enforced, as individual projects may need to introduce their custom APIs.

We have a powerful configuration system, with the configs living in the configs/ directory. Custom trainers and modules can seamlessly extend/modify the configuration options.

Training jobs are robust to interruptions and will resume seamlessly from the last saved checkpoint (assuming user provides the correct --workdir path).

Each configuration file contains a comment at the top with a COMMAND snippet to run it, and some hint of expected runtime and results. See below for more details, but generally speaking, running on a GPU machine involves calling python -m COMMAND while running on TPUs, including multi-host, involves

gcloud alpha compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all
  --command "bash big_vision/run_tpu.sh COMMAND"

See instructions below for more details on how to use Google Cloud TPUs.

Current and future contents

The first release contains the core part of pre-training, transferring, and evaluating classification models at scale on Cloud TPU VMs.

Features and projects we plan to release in the near future, in no particular order:

  • ImageNet-21k in TFDS.
  • MLP-Mixer.
  • Loading misc public models used in our publications (NFNet, MoCov3, DINO).
  • Contrastive Image-Text model training and evaluation as in LiT and CLIP.
  • "Patient and consistent" distillation.
  • Memory-efficient Polyak-averaging implementation.
  • Advanced JAX compute and memory profiling. We are using internal tools for this, but may eventually add support for the publicly available ones.

We will continue releasing code of our future publications developed within big_vision here.

Non-content

The following exist in the internal variant of this codebase, and there is no plan for their release:

  • Regular regression tests for both quality and speed. They rely heavily on internal infrastructure.
  • Advanced logging, monitoring, and plotting of experiments. This also relies heavily on internal infrastructure. However, we are open to ideas on this and may add some in the future, especially if implemented in a self-contained manner.
  • Not yet published, ongoing research projects.

Running on Cloud TPU VMs

Create TPU VMs

To create a single machine with 8 TPU cores, follow the following Cloud TPU JAX document: https://cloud.google.com/tpu/docs/run-calculation-jax

To support large-scale vision research, more cores with multiple hosts are recommended. Below we provide instructions on how to do it.

First, create some useful variables, which we be reused:

export NAME="a name of the TPU deployment, e.g. my-tpu-machine"
export ZONE="GCP geographical zone, e.g. europe-west4-a"
export GS_BUCKET_NAME="Name of the storage bucket, e.g. my_bucket"

The following command line will create TPU VMs with 32 cores, 4 hosts.

gcloud alpha compute tpus tpu-vm create $NAME --zone $ZONE --accelerator-type v3-32 --version tpu-vm-tf-2.8.0

Install big_vision on TPU VMs

Fetch the big_vision repository, copy it to all TPU VM hosts, and install dependencies.

git clone --branch=master https://github.com/google-research/big_vision
gcloud alpha compute tpus tpu-vm scp --recurse big_vision/big_vision $NAME: --worker=all --zone=$ZONE
gcloud alpha compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all --command "bash big_vision/run_tpu.sh"

Download and prepare TFDS datasets

Everything in this section you need to do only once, and, alternatively, you can also do it on your local machine and copy the result to the cloud bucket. For convenience, we provide instructions on how to prepare data using Cloud TPUs.

Download and prepare TFDS datasets using a single worker. Seven TFDS datasets used during evaluations will be generated under ~/tensorflow_datasets/ (should take 10-15 minutes in total).

gcloud alpha compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=0 --command "bash big_vision/run_tpu.sh big_vision.tools.download_tfds_datasets cifar10 cifar100 oxford_iiit_pet oxford_flowers102 cars196 dtd uc_merced"

Copy the datasets to GS bucket, to make them accessible to all TPU workers.

gcloud alpha compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=0 --command "rm -r ~/tensorflow_datasets/downloads && gsutil cp -r ~/tensorflow_datasets gs://$GS_BUCKET_NAME"

If you want to integrate other public or custom datasets, i.e. imagenet2012, please follow the official guideline.

Pre-trained models

For the full list of pre-trained models check out the load function defined in the same module as the model code. And for example config on how to use these models, see configs/transfer.py.

Run the transfer script on TPU VMs

The following command line fine-tunes a pre-trained vit-i21k-augreg-b/32 model on cifar10 dataset.

gcloud alpha compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all --command "TFDS_DATA_DIR=gs://$GS_BUCKET_NAME/tensorflow_datasets bash big_vision/run_tpu.sh big_vision.train --config big_vision/configs/transfer.py:model=vit-i21k-augreg-b/32,dataset=cifar10,crop=resmall_crop --workdir gs://$GS_BUCKET_NAME/big_vision/workdir/`date '+%m-%d_%H%M'` --config.lr=0.03"

Run the train script on TPU VMs

To train your own big_vision models on a large dataset, e.g. imagenet2012 (prepare the TFDS dataset), run the following command line.

gcloud alpha compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all --command "TFDS_DATA_DIR=gs://$GS_BUCKET_NAME/tensorflow_datasets bash big_vision/run_tpu.sh big_vision.train --config big_vision/configs/bit_i1k.py  --workdir gs://$GS_BUCKET_NAME/big_vision/workdir/`date '+%m-%d_%H%M'`"

ViT baseline

We provide a well-tuned ViT-S/16 baseline in the config file named vit_s16_i1k.py. It achieves 76.5% accuracy on ImageNet validation split in 90 epochs of training, being a strong and simple starting point for research on the ViT models.

Please see our arXiv note for more details and if this baseline happens to by useful for your research, consider citing

@article{vit_baseline,
  url = {https://arxiv.org/abs/2205.01580},
  author = {Beyer, Lucas and Zhai, Xiaohua and Kolesnikov, Alexander},
  title = {Better plain ViT baselines for ImageNet-1k},
  journal={arXiv preprint arXiv:2205.01580},
  year = {2022},
}

Citing the codebase

If you found this codebase useful for your research, please consider using the following BibTEX to cite it:

@misc{big_vision,
  author = {Beyer, Lucas and Zhai, Xiaohua and Kolesnikov, Alexander},
  title = {Big Vision},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/google-research/big_vision}}
}

Disclaimer

This is not an official Google Product.

Owner
Google Research
Google Research
Using this codebase as a tool for my own research. Making some modifications to the original repo for my own purposes.

For SwapNet Create a list.txt file containing all the images to process. This can be done with the GNU find command: find path/to/input/folder -name '

Andrew Jong 2 Nov 10, 2021
Official code release for 3DV 2021 paper Human Performance Capture from Monocular Video in the Wild.

Official code release for 3DV 2021 paper Human Performance Capture from Monocular Video in the Wild.

Chen Guo 58 Dec 24, 2022
torchbearer: A model fitting library for PyTorch

Note: We're moving to PyTorch Lightning! Read about the move here. From the end of February, torchbearer will no longer be actively maintained. We'll

631 Jan 04, 2023
SSD-based Object Detection in PyTorch

SSD-based Object Detection in PyTorch 서강대학교 현대모비스 SW 프로그램에서 진행한 인공지능 프로젝트입니다. Jetson nano를 이용해 pre-trained network를 fine tuning시켜 차량 및 신호등 인식을 구현하였습니다

Haneul Kim 1 Nov 16, 2021
Low Complexity Channel estimation with Neural Network Solutions

Interpolation-ResNet Invited paper for WSA 2021, called 'Low Complexity Channel estimation with Neural Network Solutions'. Low complexity residual con

Dianxin 10 Dec 10, 2022
This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

RobustFreqCNN About This repository contains the implementation of the paper "Towards Frequency-Based Explanation for Robust CNN" arxiv. It primarly d

Sarosij Bose 2 Jan 23, 2022
HairCLIP: Design Your Hair by Text and Reference Image

Overview This repository hosts the official PyTorch implementation of the paper: "HairCLIP: Design Your Hair by Text and Reference Image". Our single

322 Jan 06, 2023
Attention over nodes in Graph Neural Networks using PyTorch (NeurIPS 2019)

Intro This repository contains code to generate data and reproduce experiments from our NeurIPS 2019 paper: Boris Knyazev, Graham W. Taylor, Mohamed R

Boris Knyazev 242 Jan 06, 2023
Franka Emika Panda manipulator kinematics&dynamics simulation

pybullet_sim_panda Pybullet simulation environment for Franka Emika Panda Dependency pybullet, numpy, spatial_math_mini Simple example (please check s

0 Jan 20, 2022
Python package for Bayesian Machine Learning with scikit-learn API

Python package for Bayesian Machine Learning with scikit-learn API Installing & Upgrading package pip install https://github.com/AmazaspShumik/sklearn

Amazasp Shaumyan 482 Jan 04, 2023
Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

NPMs: Neural Parametric Models Project Page | Paper | ArXiv | Video NPMs: Neural Parametric Models for 3D Deformable Shapes Pablo Palafox, Aljaz Bozic

PabloPalafox 109 Nov 22, 2022
This code is for eCaReNet: explainable Cancer Relapse Prediction Network.

eCaReNet This code is for eCaReNet: explainable Cancer Relapse Prediction Network. (Towards Explainable End-to-End Prostate Cancer Relapse Prediction

Institute of Medical Systems Biology 2 Jul 28, 2022
Use .csv files to record, play and evaluate motion capture data.

Purpose These scripts allow you to record mocap data to, and play from .csv files. This approach facilitates parsing of body movement data in statisti

21 Dec 12, 2022
[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation

[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation [Paper] Prerequisites To install requirements: pip install -r requirements.txt

Guangrui Li 84 Dec 26, 2022
Code basis for the paper "Camera Condition Monitoring and Readjustment by means of Noise and Blur" (2021)

Camera Condition Monitoring and Readjustment by means of Noise and Blur This repository contains the source code of the paper: Wischow, M., Gallego, G

7 Dec 22, 2022
An unsupervised learning framework for depth and ego-motion estimation from monocular videos

SfMLearner This codebase implements the system described in the paper: Unsupervised Learning of Depth and Ego-Motion from Video Tinghui Zhou, Matthew

Tinghui Zhou 1.8k Dec 30, 2022
Official Pytorch implementation of Meta Internal Learning

Official Pytorch implementation of Meta Internal Learning

10 Aug 24, 2022
一些经典的CTR算法的复现; LR, FM, FFM, AFM, DeepFM,xDeepFM, PNN, DCN, DCNv2, DIFM, AutoInt, FiBiNet,AFN,ONN,DIN, DIEN ... (pytorch, tf2.0)

CTR Algorithm 根据论文, 博客, 知乎等方式学习一些CTR相关的算法 理解原理并自己动手来实现一遍 pytorch & tf2.0 保持一颗学徒的心! Schedule Model pytorch tensorflow2.0 paper LR ✔️ ✔️ \ FM ✔️ ✔️ Fac

luo han 149 Dec 20, 2022
VISNOTATE: An Opensource tool for Gaze-based Annotation of WSI Data

VISNOTATE: An Opensource tool for Gaze-based Annotation of WSI Data Introduction Requirements Installation and Setup Supported Hardware and Software R

SigmaLab 1 Jun 14, 2022
Interpolation-based reduced-order models

Interpolation-reduced-order-models Interpolation-based reduced-order models High-fidelity computational fluid dynamics (CFD) solutions are time consum

Donovan Blais 1 Jan 10, 2022