Bridging Vision and Language Model

Last update: Dec 27, 2022

Related tags

Overview

BriVL

BriVL (Bridging Vision and Language Model) 是首个中文通用图文多模态大规模预训练模型。BriVL模型在图文检索任务上有着优异的效果，超过了同期其他常见的多模态预训练模型（例如UNITER、CLIP）。

BriVL论文：WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training

适用场景

适用场景示例：图像检索文本、文本检索图像、图像标注、图像零样本分类、作为其他下游多模态任务的输入特征等。

技术特色

BriVL使用对比学习算法将图像和文本映射到了同一特征空间，可用于弥补图像特征和文本特征之间存在的隔阂。
基于视觉-语言弱相关的假设，除了能理解对图像的描述性文本外，也可以捕捉图像和文本之间存在的抽象联系。
图像编码器和文本编码器可分别独立运行，有利于实际生产环境中的部署。

下载专区

模型	语言	参数量（单位：亿）	文件（file）
BriVL-1.0	中文	10亿	BriVL-1.0-5500w.tar

使用BriVL

搭建环境

# 环境要求
lmdb==0.99
timm==0.4.12
easydict==1.9
pandas==1.2.4
jsonlines==2.0.0
tqdm==4.60.0
torchvision==0.9.1
numpy==1.20.2
torch==1.8.1
transformers==4.5.1
msgpack_numpy==0.4.7.1
msgpack_python==0.5.6
Pillow==8.3.1
PyYAML==5.4.1

配置要求在requirements.txt中，可使用下面的命令：

pip install -r requirements.txt

特征提取与计算检索结果

cd evaluation/
bash test_xyb.sh

数据解释

现已放入3个图文对示例:

./data/imgs  # 放入图像
./data/jsonls # 放入图文对描述

引用BriVL

@article{DBLP:journals/corr/abs-2103-06561,
  author    = {Yuqi Huo and
               Manli Zhang and
               Guangzhen Liu and
               Haoyu Lu and
               Yizhao Gao and
               Guoxing Yang and
               Jingyuan Wen and
               Heng Zhang and
               Baogui Xu and
               Weihao Zheng and
               Zongzheng Xi and
               Yueqian Yang and
               Anwen Hu and
               Jinming Zhao and
               Ruichen Li and
               Yida Zhao and
               Liang Zhang and
               Yuqing Song and
               Xin Hong and
               Wanqing Cui and
               Dan Yang Hou and
               Yingyan Li and
               Junyi Li and
               Peiyu Liu and
               Zheng Gong and
               Chuhao Jin and
               Yuchong Sun and
               Shizhe Chen and
               Zhiwu Lu and
               Zhicheng Dou and
               Qin Jin and
               Yanyan Lan and
               Wayne Xin Zhao and
               Ruihua Song and
               Ji{-}Rong Wen},
  title     = {WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training},
  journal   = {CoRR},
  volume    = {abs/2103.06561},
  year      = {2021},
  url       = {https://arxiv.org/abs/2103.06561},
  archivePrefix = {arXiv},
  eprint    = {2103.06561},
  timestamp = {Tue, 03 Aug 2021 12:35:30 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2103-06561.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Bridging Vision and Language Model

Related tags

Overview

BriVL

适用场景

技术特色

下载专区

使用BriVL

搭建环境

特征提取与计算检索结果

数据解释

引用BriVL

Owner

Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

Fuse radar and camera for detection

Library for converting from RGB / GrayScale image to base64 and back.

Self-Supervised depth kalilia

[ICCV2021] Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving

Generalized Decision Transformer for Offline Hindsight Information Matching

A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

Annotate with anyone, anywhere.

pytorch implementation of fast-neural-style

PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision.

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.

CvT2DistilGPT2 is an encoder-to-decoder model that was developed for chest X-ray report generation.

aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Misc YOLOL scripts for use in the Starbase space sandbox videogame

LQM - Improving Object Detection by Estimating Bounding Box Quality Accurately

本步态识别系统主要基于GaitSet模型进行实现

Fully Convolutional DenseNet (A.K.A 100 layer tiramisu) for semantic segmentation of images implemented in TensorFlow.