It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Last update: Dec 20, 2022

Overview

CLIP-ONNX

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Usage

Install clip-onnx module and requirements first. Use this trick

!pip install git+https://github.com/Lednik7/CLIP-ONNX.git

Example in 3 steps

Download CLIP image from repo

!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true

Load standard CLIP model, image, text on cpu

import clip
from PIL import Image

# onnx cannot work with cuda
model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False)
# batch first
image = preprocess(Image.open("CLIP.png")).unsqueeze(0) # [1, 3, 224, 224]
text = clip.tokenize(["a diagram", "a dog", "a cat"]) # [3, 77]

Create CLIP-ONNX object to convert model to onnx

from clip_onnx import clip_onnx, attention
clip.model.ResidualAttentionBlock.attention = attention

visual_path = "clip_visual.onnx"
textual_path = "clip_textual.onnx"

# ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
onnx_model = clip_onnx(model, providers=["CPUExecutionProvider"], # cpu mode
                       visual_path=visual_path, textual_path=textual_path)
onnx_model.convert2onnx(image, text, verbose=True)
onnx_model.start_sessions()

Use for standard CLIP API. Batch inference

image_features = onnx_model.encode_image(image)
text_features = onnx_model.encode_text(text)

logits_per_image, logits_per_text = onnx_model(image, text)
probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)  # prints: [[0.41456965 0.29270944 0.29272085]]

Enjoy the speed

Examples

See examples folder for more details
Some parts of the code were taken from the post. Thank you neverix for this notebook.

Comments

Can't use CUDAExecutionProvider
Hey, I'm trying to use the code on GPU and I encountered 2 problems:

when running pip install git+https://github.com/Lednik7/CLIP-ONNX.git I got the following error (tried on multiple machines): ERROR: Could not find a version that satisfies the requirement torch==1.10.0+cu111 (from clip-onnx)

I fixed it by installing that version of torch by myself. with pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html, and then running the rest of the installation.

After I installed the package, I tried to run the example in the readme with CPUExecutionProvider and it worked fine, but when I'm trying to run it on GPU with CUDAExecutionProvider I get the following error message (again on different machines):

2022-01-31 20:57:03.234399301 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met. 2022-01-31 20:57:03.872349008 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:535 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

I can't figure out what is the problem. Any help?
opened by YoadTew 13
Performance is inconsistent with the original model
Hi, thanks for providing this useful tool! However, I found that the result produced by the generated ONNX model is inconsistent with the original CLIP model. Here is the code I used to test the original model:

model, preprocess = clip.load("ViT-B/32", device="cpu", jit=False) image = preprocess(Image.open("CLIP.png")).unsqueeze(0).cpu() # [1, 3, 224, 224] text = clip.tokenize(["a diagram", "a dog", "a cat"]).cpu() # [3, 77] image_features = model.encode_image(image) text_features = model.encode_text(text) logits_per_image, logits_per_text = model(image, text) probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy() print("Label probs:", probs)

The result is: Label probs: [[0.9927937 0.00421069 0.00299573]]

However, when using the onnx model, the result is: Label probs: [[0.41456965 0.29270944 0.29272085]].

Could you help me with this? Thanks!
opened by Cestlaviez 5

Error on installing the torch version in requirements.txt

pip install git+https://github.com/Lednik7/CLIP-ONNX.git

ERROR: Could not find a version that satisfies the requirement torch==1.11.0+cu113 (from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0)
ERROR: No matching distribution found for torch==1.11.0+cu113

python version is 3.7.13

opened by dingusagar 2

ERROR: No matching distribution found for onnxruntime==1.11

Hi, Thanks for the great work!

I am having this error when I try to install the package.

ERROR: No matching distribution found for onnxruntime==1.11

Maybe we can update the requirements.txt?

opened by wanliAlex 1
updated and added information

add info about export params

update GPU(K80) benchmarks

update GPU(T4) benchmarks

update CPU benchmarks

change opset_version to 12

updated readme according to the version

update branch link

update version

update packages

opened by Lednik7 0
Replace the operator of "torch.einsum"

q, k, v = (torch.einsum("tbh, oh -> tbo", x, self.attn.in_proj_weight) + self.attn.in_proj_bias).contiguous().chunk( 3, dim=-1)

@Lednik7 Thanks for your great work on Clip-ONNX. for the pytorch operator of "torch.einsum" , if we don't want to use this operator , do you have other codes to replace this operator? this operator is not friendly to some Inference engine, like NV TensorRT, so if you have other codes to replace einsum, that will be better

opened by zhangnju 2

Releases(1.2)

1.2(May 3, 2022)

add info about export params

update GPU(K80) benchmarks

update GPU(T4) benchmarks

update CPU benchmarks

change opset_version to 12

updated readme according to the version

update branch link

update version

update packages
Source code(tar.gz)
Source code(zip)
1.0(May 3, 2022)

Works but with crutches
Source code(tar.gz)
Source code(zip)

Owner

Gerasimov Maxim

16 y.o. Data Scientist. Graduated by Yandex Lyceum and Tinkoff Education

GitHub Repository

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1 Liang Pan1 Zhongang Cai1,2,3 Ziwei Liu1* 1S-Lab, Nanyang Technologic

96 Jan 03, 2023

Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Good news! We release a clean version of PVNet: clean-pvnet, including how to train the PVNet on the custom dataset. Use PVNet with a detector. The tr

722 Dec 27, 2022

Doods2 - API for detecting objects in images and video streams using Tensorflow

DOODS2 - Return of DOODS Dedicated Open Object Detection Service - Yes, it's a b

101 Jan 04, 2023

On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization

On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization This repository contains the evaluation code and alternative pseudo ground truth

36 Dec 22, 2022

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect. It handles Algerian

117 Jan 07, 2023

Semi-Supervised Learning for Fine-Grained Classification

Semi-Supervised Learning for Fine-Grained Classification This repo contains the code of: A Realistic Evaluation of Semi-Supervised Learning for Fine-G

25 Nov 08, 2022

PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning"

deepGCFX PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning" Pr

4 Aug 11, 2022

Wordle Env: A Daily Word Environment for Reinforcement Learning

Wordle Env: A Daily Word Environment for Reinforcement Learning Setup Steps: git pull [email&#

2 Mar 28, 2022

An improvement of FasterGICP: Acceptance-rejection Sampling based 3D Lidar Odometry

fasterGICP This package is an improvement of fast_gicp Please cite our paper if possible. W. Jikai, M. Xu, F. Farzin, D. Dai and Z. Chen, "FasterGICP:

79 Dec 31, 2022

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

96 Dec 22, 2022

A large-scale database for graph representation learning

29 Nov 25, 2022

Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Minimal PyTorch implementation of Generative Latent Optimization This is a reimplementation of the paper Piotr Bojanowski, Armand Joulin, David Lopez-

117 Nov 27, 2022

Several simple examples for popular neural network toolkits calling custom CUDA operators.

Neural Network CUDA Example Several simple examples for neural network toolkits (PyTorch, TensorFlow, etc.) calling custom CUDA operators. We provide

798 Jan 01, 2023

Graduation Project

Gesture-Detection-and-Depth-Estimation This is my graduation project. (1) In this project, I use the YOLOv3 object detection model to detect gesture i

1 Nov 23, 2021

We propose a new method for effective shadow removal by regarding it as an exposure fusion problem.

Auto-exposure fusion for single-image shadow removal We propose a new method for effective shadow removal by regarding it as an exposure fusion proble

146 Dec 31, 2022

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

PySlowFast PySlowFast is an open source video understanding codebase from FAIR that provides state-of-the-art video classification models with efficie

5.3k Jan 03, 2023

Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)

Stratified Transformer for 3D Point Cloud Segmentation Xin Lai*, Jianhui Liu*, Li Jiang, Liwei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan Qi, Jiaya Jia

195 Jan 01, 2023

Repository for the paper "Online Domain Adaptation for Occupancy Mapping", RSS 2020

RSS 2020 - Online Domain Adaptation for Occupancy Mapping Repository for the paper "Online Domain Adaptation for Occupancy Mapping", Robotics: Science

26 Sep 22, 2022

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

54 Nov 21, 2022

Pytorch Implementation of LNSNet for Superpixel Segmentation

LNSNet Overview Official implementation of Learning the Superpixel in a Non-iterative and Lifelong Manner (CVPR'21) Learning Strategy The proposed LNS

42 Oct 11, 2022

It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

Related tags

Overview

CLIP-ONNX

Usage

Example in 3 steps

Examples

Comments

Can't use CUDAExecutionProvider

Performance is inconsistent with the original model

Error on installing the torch version in requirements.txt

ERROR: No matching distribution found for onnxruntime==1.11

updated and added information

Replace the operator of "torch.einsum"

Releases(1.2)

1.2(May 3, 2022)

1.0(May 3, 2022)

Owner

Gerasimov Maxim

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Doods2 - API for detecting objects in images and video streams using Tensorflow

On the Limits of Pseudo Ground Truth in Visual Camera Re-Localization

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

Semi-Supervised Learning for Fine-Grained Classification

PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning"

Wordle Env: A Daily Word Environment for Reinforcement Learning

An improvement of FasterGICP: Acceptance-rejection Sampling based 3D Lidar Odometry

Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

A large-scale database for graph representation learning

Minimal PyTorch implementation of Generative Latent Optimization from the paper "Optimizing the Latent Space of Generative Networks"

Several simple examples for popular neural network toolkits calling custom CUDA operators.

Graduation Project

We propose a new method for effective shadow removal by regarding it as an exposure fusion problem.

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)

Repository for the paper "Online Domain Adaptation for Occupancy Mapping", RSS 2020

[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Pytorch Implementation of LNSNet for Superpixel Segmentation