(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Last update: Jan 08, 2023

Related tags

Deep Learning Kaleido-BERT

Overview

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Mingchen Zhuge*, Dehong Gao*, Deng-Ping Fan#, Linbo Jin, Ben Chen, Haoming Zhou, Minghui Qiu, Ling Shao.

[Paper][中文版][Video][Poster][MSRA_Slide][News1][New2][MSRA_Talking][机器之心_Talking]

Introduction

We present a new vision-language (VL) pre-training model dubbed Kaleido-BERT, which introduces a novel kaleido strategy for fashion cross-modality representations from transformers. In contrast to random masking strategy of recent VL models, we design alignment guided masking to jointly focus more on image-text semantic relations. To this end, we carry out five novel tasks, \ie, rotation, jigsaw, camouflage, grey-to-color, and blank-to-color for self-supervised VL pre-training at patches of different scale. Kaleido-BERT is conceptually simple and easy to extend to the existing BERT framework, it attains state-of-the-art results by large margins on four downstream tasks, including text retrieval ([email protected]: 4.03% absolute improvement), image retrieval ([email protected]: 7.13% abs imv.), category recognition (ACC: 3.28% abs imv.), and fashion captioning (Bleu4: 1.2 abs imv.). We validate the efficiency of Kaleido-BERT on a wide range of e-commercial websites, demonstrating its broader potential in real-world applications.

Noted

Code will be released in 2021/4/16.
This is the tensorflow implementation built on Alibaba/EasyTransfer. We will also release a Pytorch version built on Huggingface/Transformers in future.
If you feel hard to download these datasets, please modify /dataset/get_pretrain_data.sh, /dataset/get_finetune_data.sh, /dataset/get_retrieve_data.sh, and comment out some wget #file_links as you want. This will not inhibit following implementation.

Get started

Clone this code

git clone [email protected]:mczhuge/Kaleido-BERT.git
cd Kaleido-BERT

Enviroment setup (Details can be found on conda_env.info)

conda create --name kaleidobert --file conda_env.info
conda activate kaleidobert
conda install tensorflow-gpu=1.15.0
pip install boto3 tqdm tensorflow_datasets --index-url=https://mirrors.aliyun.com/pypi/simple/
pip install sentencepiece==0.1.92 sklearn --index-url=https://mirrors.aliyun.com/pypi/simple/
pip install joblib==0.14.1
python setup.py develop

Download Pretrained Dependancy

cd Kaleido-BERT/scripts/checkpoint
sh get_checkpoint.sh

Finetune

#Download finetune datasets

cd Kaleido-BERT/scripts/dataset
sh get_finetune_data.sh
sh get_retrieve_data.sh

#Testing CAT/SUB

cd Kaleido-BERT/scripts
sh run_cat.sh
sh run_subcat.sh

#Testing TIR/ITR

cd Kaleido-BERT/scripts
sh run_i2t.sh
sh run_t2i.sh

Pre-training

#Download pre-training datasets

cd Kaleido-BERT/scripts/dataset
sh get_prtrain_data.sh

#Remove existed checkpoint
rm -rf Kaleido-BERT/checkpoint/pretrained

#Run pre-training
cd Kaleido-BERT/scripts/
sh run_pretrain.sh

Acknowlegement

Thanks Alibaba ICBU Search Team and Alibaba PAI Team for technical support.

Citing Kaleido-BERT

@InProceedings{Zhuge_2021_CVPR,
    author    = {Zhuge, Mingchen and Gao, Dehong and Fan, Deng-Ping and Jin, Linbo and Chen, Ben and Zhou, Haoming and Qiu, Minghui and Shao, Ling},
    title     = {Kaleido-BERT: Vision-Language Pre-Training on Fashion Domain},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {12647-12657}
}

Contact

Mingchen Zhuge (email: [email protected] | wechat: tjpxiaoming)
Deng-Ping Fan (email: [email protected])
Dehong Gao (email: [email protected])

Feel free to contact us if you have additional questions.

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Related tags

Overview

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Introduction

Noted

Get started

Acknowlegement

Citing Kaleido-BERT

Contact

Owner

Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing (CVPR 2018).

Original code for "Zero-Shot Domain Adaptation with a Physics Prior"

Wordplay, an artificial Intelligence based crossword puzzle solver.

MusicYOLO framework uses the object detection model, YOLOx, to locate notes in the spectrogram.

Generic image compressor for machine learning. Pytorch code for our paper "Lossy compression for lossless prediction".

Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)

A boosting-based Multiple Instance Learning (MIL) package that includes MIL-Boost and MCIL-Boost

A very short and easy implementation of Quantile Regression DQN

Python library for loading and using triangular meshes.

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

Knowledge Distillation Toolbox for Semantic Segmentation

Breast cancer is been classified into benign tumour and malignant tumour.

PyTorch implementation of our method for adversarial attacks and defenses in hyperspectral image classification.

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

style mixing for animation face

AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

Code for Understanding Pooling in Graph Neural Networks

Gradient Step Denoiser for convergent Plug-and-Play

Real-Time Social Distance Monitoring tool using Computer Vision