CLIPfa: Connecting Farsi Text and Images

Overview

CLIPfa: Connecting Farsi Text and Images

OpenAI released the paper Learning Transferable Visual Models From Natural Language Supervision in which they present the CLIP (Contrastive Language–Image Pre-training) model. This model is trained to connect text and images, by matching their corresponding vector representations using a contrastive learning objective. CLIP consists of two separate models, a vision encoder and a text encoder. These were trained on a wooping 400 Million images and corresponding captions. We have trained a Farsi (Persian) version of OpenAI's CLIP on a dataset of 400,000 (image, text) pairs. We used Farahani's RoBERTa-fa as the text encoder and ‍‍ViT‍ as the vision encoder from Original CLIP and finetuned them.

CLIPfa image

It should be noted that only 400K pairs were used for this training, whereas 4 million pairs were used for the Original CLIP. Also, the training took 30 days across 592 GPUs powered by the V100 chip.

How to use?

Both models generate vectors with 768 dimensions.

from transformers import CLIPVisionModel, RobertaModel, AutoTokenizer, CLIPFeatureExtractor
# download pre-trained models
vision_encoder = CLIPVisionModel.from_pretrained('SajjadAyoubi/clip-fa-vision')
preprocessor = CLIPFeatureExtractor.from_pretrained('SajjadAyoubi/clip-fa-vision')
text_encoder = RobertaModel.from_pretrained('SajjadAyoubi/clip-fa-text')
tokenizer = AutoTokenizer.from_pretrained('SajjadAyoubi/clip-fa-text')
# define input image and input text
text = 'something'
image = PIL.Image.open('my_favorite_image.jpg')
# compute embeddings
text_embedding = text_encoder(**tokenizer(text, return_tensors='pt')).pooler_output
image_embedding = vision_encoder(**preprocessor(image, return_tensors='pt')).pooler_output
text_embedding.shape == image_embedding.shape

Demo:

The followings are just some use cases of CLIPfa on 25K Unsplash images

  • use pip install -q git+https://github.com/sajjjadayobi/clipfa.git
from clipfa import CLIPDemo
demo = CLIPDemo(vision_encoder, text_encoder, tokenizer)
demo.compute_text_embeddings(['گاو' ,'اسب' ,'ماهی'])
demo.compute_image_embeddings(test_df.image_path.to_list())

Image Search:

demo.image_search(query='غروب خورشید')

demo.image_search(query='جنگل در زمستان برفی')

Analogy:

demo.anology('sunset.jpg', additional_text='دریا')

demo.anology('sunset.jpg', additional_text='برف')

Zero Shot Image Classification:

demo.zero_shot(image_path='apples.jpg')
  • Provided labels with their probability for each image.
گاو:36 , ماهی:22, اسب:42 گاو:41 , ماهی:23, اسب:36 گاو:26 , ماهی:45, اسب:27
image image image

Online Demo: CLIPfa at Huggingface 🤗 spaces

We used a small set of images (25K) to keep this app almost real-time, but it's obvious that the quality of image search depends heavily on the size of the image database.

Dataset: 400K

We started with this question that how much the original Clip model depends on its big training dataset containing a lot of conceptual samples. Our model shows that It is possible to meet an acceptable enough target with only a little amount of data even though, It may not have known enough concepts and subjects to be used widely. Our model trained on a dataset gathered from different resources such as The Flickr30k, MS-COCO 2017, Google CCm3, ... . We used these datasets and translated them into the Persian language with a tool prepared by ourselves. Using the Google Translate and Multilingual Similarity Check method we provided an automatic translator that has been given a list of English captions and filtered by the best translations.

  • Note: We used image2ds a great tool to download large scale image datasets such as MS-COCO. It can download, resize and package 100M urls in 20h on one machine. Also supports saving captions for url+caption datasets.
  • coco-flickr-fa 130K on Kaggle

Training:

Any dataset can be used with little change by the training code. CLIPfa can be trained with other encoders as long as they have the same hidden size at the last layer. In this notebook I used training code to train a small CLIP on translated flickr30K dataset.

Citation: ↩️

If you have a technical question regarding the model, code or publication, create an issue in the repository. we didn't publish any papers on the work. However, if you did, please cite us properly with an entry like one below.

@misc{ParsBigBird,
  author          = {Sajjad Ayoubi, Navid Kanaani},
  title           = {CLIPfa: Connecting Farsi Text and Images},
  year            = 2021,
  publisher       = {GitHub},
  journal         = {GitHub repository},
  howpublished    = {\url{https://github.com/SajjjadAyobi/CLIPfa}},
}

Made with ❤️ in my basement 🤫

Owner
Sajjad Ayoubi
Wants to be a Machine Learning Engineer
Sajjad Ayoubi
Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

Paradigm Shift in NLP Welcome to the webpage for "Paradigm Shift in Natural Language Processing". Some resources of the paper are constantly maintaine

Tianxiang Sun 41 Dec 30, 2022
小布助手对话短文本语义匹配的一个baseline

oppo-text-match 小布助手对话短文本语义匹配的一个baseline 模型 参考:https://kexue.fm/archives/8213 base版本线下大概0.952,线上0.866(单模型,没做K-flod融合)。 训练 测试环境:tensorflow 1.15 + keras

苏剑林(Jianlin Su) 132 Dec 14, 2022
MiCECo - Misskey Custom Emoji Counter

MiCECo Misskey Custom Emoji Counter Introduction This little script counts custo

7 Dec 25, 2022
Simple Text-To-Speech Bot For Discord

Simple Text-To-Speech Bot For Discord This is a very simple TTS bot for discord made with python. For this bot you need FFMPEG, see installation to se

1 Sep 26, 2022
Implementation of Fast Transformer in Pytorch

Fast Transformer - Pytorch Implementation of Fast Transformer in Pytorch. This only work as an encoder. Yannic video AI Epiphany Install $ pip install

Phil Wang 167 Dec 27, 2022
This repository contains examples of Task-Informed Meta-Learning

Task-Informed Meta-Learning This repository contains examples of Task-Informed Meta-Learning (paper). We consider two tasks: Crop Type Classification

10 Dec 19, 2022
Code for hyperboloid embeddings for knowledge graph entities

Implementation for the papers: Self-Supervised Hyperboloid Representations from Logical Queries over Knowledge Graphs, Nurendra Choudhary, Nikhil Rao,

30 Dec 10, 2022
precise iris segmentation

PI-DECODER Introduction PI-DECODER, a decoder structure designed for Precise Iris Segmentation and Location. The decoder structure is shown below: Ple

8 Aug 08, 2022
TFIDF-based QA system for AIO2 competition

AIO2 TF-IDF Baseline This is a very simple question answering system, which is developed as a lightweight baseline for AIO2 competition. In the traini

Masatoshi Suzuki 4 Feb 19, 2022
The source code of HeCo

HeCo This repo is for source code of KDD 2021 paper "Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning". Paper Link: htt

Nian Liu 106 Dec 27, 2022
source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

WhiteningBERT Source code and data for paper WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach. Preparation git clone https://github.com

49 Dec 17, 2022
Simple Annotated implementation of GPT-NeoX in PyTorch

Simple Annotated implementation of GPT-NeoX in PyTorch This is a simpler implementation of GPT-NeoX in PyTorch. We have taken out several optimization

labml.ai 101 Dec 03, 2022
State of the Art Natural Language Processing

Spark NLP: State of the Art Natural Language Processing Spark NLP is a Natural Language Processing library built on top of Apache Spark ML. It provide

John Snow Labs 3k Jan 05, 2023
💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes Official PyTorch implementation and EmoCause evaluatio

Hyunwoo Kim 50 Dec 21, 2022
The swas programming language

The Swas programming language This is a language that was made for fun. Installation Step 0: Make sure you have python installed Step 1. Clone this re

Swas.py 19 Jul 18, 2022
Generating new names based on trends in data using GPT2 (Transformer network)

MLOpsNameGenerator Overall Goal The goal of the project is to develop a model that is capable of creating Pokémon names based on its description, usin

Gustav Lang Moesmand 2 Jan 10, 2022
초성 해석기 based on ko-BART

초성 해석기 개요 한국어 초성만으로 이루어진 문장을 입력하면, 완성된 문장을 예측하는 초성 해석기입니다. 초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 예측 문장: 나는 너를 좋아해 모델 모델은 SKT-AI에서 공개한 Ko-BART를 이용합니다. 데이터 문장 단위로 이루어진 아무 코퍼스나

Dawoon Jung 29 Oct 28, 2022
[NeurIPS 2021] Code for Learning Signal-Agnostic Manifolds of Neural Fields

Learning Signal-Agnostic Manifolds of Neural Fields This is the uncleaned code for the paper Learning Signal-Agnostic Manifolds of Neural Fields. The

60 Dec 12, 2022
Search for documents in a domain through Google. The objective is to extract metadata

MetaFinder - Metadata search through Google _____ __ ___________ .__ .___ / \

Josué Encinar 85 Dec 16, 2022
VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

Salesforce 44 Nov 01, 2022