ZeroVL - The official implementation of ZeroVL

Last update: Nov 04, 2022

Related tags

Overview

This repository contains source code necessary to reproduce the results presented in the paper ZeroVL: A Strong Baseline for Aligning Vision-Language Representations with Limited Resources.

Pioneering dual-encoder pre-training works (e.g., CLIP and ALIGN) require a tremendous amount of data and computational resources (e.g., billion-level web data and hundreds of GPUs), which prevent researchers with limited resources from reproduction and further exploration. To this end, we provide a comprehensive training guidance, which allows us to conduct dual-encoder multi-modal representation alignment with limited resources. Meanwhile, we provide a reproducible strong baseline of competitive results, namely ZeroVL, with publicly accessible academic datasets and a popular experimental environment.

Performance

Image-text retreival RSUM scores on MSCOCO and Flickr30K datasets:

method	computation	data	COCO(zs.)	COCO(ft.)	F30K(zs.)	F30K(ft.)
CLIP	256 V100	400M	400.2	-	540.6	-
ALIGN	1024 TPUv3	1800M	425.3	500.4	553.3	576.0
baseline	8 V100	14.2M	363.5	471.9	476.8	553.0
ZeroVL	8 V100	14.2M	425.0	485.0	536.2	561.6
ZeroVL	8 V100	100M	442.1	500.5	546.5	573.6

zs.: zero-shot setting, ft.: fine-tuned setting.

Installation

Requirements:

Python 3.7
Pytorch 1.8.1
torchvision 0.9.1
cuda 11.1

Install requirements:

pip3 install -r requirements.txt

Getting Started

Check GETTING_STARTED.md for codebase usage.

Model Zoo

We will release pre-trained models soon.

Citing ZeroVL

If you use ZeroVL in your research or wish to refer to the baseline results, please use the following BibTeX entry.

@article{cui2021zerovl,
  title={ZeroVL: A Strong Baseline for Aligning Vision-Language Representations with Limited Resources},
  author={Cui, Quan and Zhou, Boyan and Guo, Yu and Yin, Weidong and Wu, Hao and Yoshie, Osamu},
  journal={arXiv preprint arXiv:2112.09331},
  year={2021}
}

License

ZeroVL is released under the MIT license. See LICENSE for details.

ZeroVL - The official implementation of ZeroVL

Related tags

Overview

Performance

Installation

Getting Started

Model Zoo

Citing ZeroVL

License

Owner

Code and results accompanying our paper titled Mixture Proportion Estimation and PU Learning: A Modern Approach at Neurips 2021 (Spotlight)

10th place solution for Google Smartphone Decimeter Challenge at kaggle.

PyTorch Code for NeurIPS 2021 paper Anti-Backdoor Learning: Training Clean Models on Poisoned Data.

Objax Apache-2Objax (🥉19 · ⭐ 580) - Objax is a machine learning framework that provides an Object.. Apache-2 jax

🔥3D-RecGAN in Tensorflow (ICCV Workshops 2017)

Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr.

Distance Encoding for GNN Design

Boosted neural network for tabular data

Focal Loss for Dense Rotation Object Detection

Semantic Segmentation in Pytorch

An official implementation of "Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation" (CVPR 2021) in PyTorch.

Acute ischemic stroke dataset

The official implementation code of "PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction."

Amazing-Python-Scripts - 🚀 Curated collection of Amazing Python scripts from Basics to Advance with automation task scripts.

Bianace Prediction Pytorch Model

This is an example of object detection on Micro bacterium tuberculosis using Mask-RCNN

Code to reproduce the results for Compositional Attention

Norm-based Analysis of Transformer

“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)

Our implementation used for the MICCAI 2021 FLARE Challenge titled 'Efficient Multi-Organ Segmentation Using SpatialConfiguartion-Net with Low GPU Memory Requirements'.