CLOOB training (JAX) and inference (JAX and PyTorch)

Last update: Nov 27, 2022

Related tags

Overview

cloob-training

Pretrained models

There are two pretrained CLOOB models in this repo at the moment, a 16 epoch and a 32 epoch ViT-B/16 checkpoint trained on LAION 400M.

Zero-shot ImageNet validation set accuracy (using OpenCLIP's code):

Model name	Top 1	Top 5
cloob_laion_400m_vit_b_16_16_epochs	0.61238	0.8492
cloob_laion_400m_vit_b_16_32_epochs	0.62816	0.85964
OpenAI CLIP ViT-B/32	0.6327	0.88772
OpenAI CLIP ViT-B/16	0.68132	0.91768
OpenAI CLIP ViT-L/14	0.75388	0.9454
OpenAI CLIP ViT-L/14 @ 336 px	0.76564	0.9515
OpenAI CLIP RN50	0.59806	0.86498
OpenAI CLIP RN101	0.62296	0.88106
OpenAI CLIP RN50x4	0.66268	0.9046
OpenAI CLIP RN50x16	0.70754	0.92822
OpenAI CLIP RN50x64	0.74134	0.94146

PyTorch

from cloob_training import model_pt, pretrained

pretrained.list_configs()

returns:

['cloob_laion_400m_vit_b_16_16_epochs', 'cloob_laion_400m_vit_b_16_32_epochs']

The models can be used by:

config = pretrained.get_config('cloob_laion_400m_vit_b_16_16_epochs')
model = model_pt.get_pt_model(config)
checkpoint = pretrained.download_checkpoint(config)
model.load_state_dict(model_pt.get_pt_params(config, checkpoint))
model.eval().requires_grad_(False).to('cuda')

Model class attributes:

model.config: the model config dict.

model.image_encoder: the image encoder, which expects NCHW batches of normalized images (preprocessed by model.normalize), where C = model.config['image_encoder']['input_channels'] and H, W = model.config['image_encoder']['image_size'].

model.text_encoder: the text encoder, which expects text tokenized by model.tokenize.

model.normalize: the preprocessor for image tensors.

model.tokenize: the preprocessor for text.

JAX

Coming soon...

Training (JAX only)

Coming soon...

CLOOB training (JAX) and inference (JAX and PyTorch)

Related tags

Overview

cloob-training

Pretrained models

PyTorch

JAX

Training (JAX only)

Owner

Katherine Crowson

Adversarial-autoencoders - Tensorflow implementation of Adversarial Autoencoders

Angle data is a simple data type.

Simple and Distributed Machine Learning

MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

Deep Learning segmentation suite designed for 2D microscopy image segmentation

Multiple paper open-source codes of the Microsoft Research Asia DKI group

Cross-media Structured Common Space for Multimedia Event Extraction (ACL2020)

Cupytorch - A small framework mimics PyTorch using CuPy or NumPy

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Rocket-recycling with Reinforcement Learning

Keras attention models including botnet,CoaT,CoAtNet,CMT,cotnet,halonet,resnest,resnext,resnetd,volo,mlp-mixer,resmlp,gmlp,levit

A program to recognize fruits on pictures or videos using yolov5

Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

CoReNet is a technique for joint multi-object 3D reconstruction from a single RGB image.

Self-supervised Product Quantization for Deep Unsupervised Image Retrieval - ICCV2021

This repository contains all data used for writing a research paper Multiple Object Trackers in OpenCV: A Benchmark, presented in ISIE 2021 conference in Kyoto, Japan.

Deep Learning Head Pose Estimation using PyTorch.

An open-source Deep Learning Engine for Healthcare that aims to treat & prevent major diseases

the official code for ICRA 2021 Paper: "Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation"

ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI