OCR engine for all the languages

Overview

Description

https://travis-ci.org/mittagessen/kraken.svg?branch=master

kraken is a turn-key OCR system optimized for historical and non-Latin script material.

kraken's main features are:

  • Fully trainable layout analysis and character recognition
  • Right-to-Left, BiDi, and Top-to-Bottom script support
  • ALTO, PageXML, abbyXML, and hOCR output
  • Word bounding boxes and character cuts
  • Multi-script recognition support
  • Public repository of model files
  • Lightweight model files
  • Variable recognition network architectures

Installation

When using a recent version of pip all dependencies will be installed from binary wheel packages, so installing build-essential or your distributions equivalent is often unnecessary. kraken only runs on Linux or Mac OS X. Windows is not supported.

Install the latest development version through conda:

$ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment.yml
$ conda env create -f environment.yml

or:

$ wget https://raw.githubusercontent.com/mittagessen/kraken/master/environment_cuda.yml
$ conda env create -f environment_cuda.yml

for CUDA acceleration with the appropriate hardware.

It is also possible to install the latest stable release from pypi:

$ pip install kraken

Finally you'll have to scrounge up a model to do the actual recognition of characters. To download the default model for printed English text and place it in the kraken directory for the current user:

$ kraken get 10.5281/zenodo.2577813

A list of libre models available in the central repository can be retrieved by running:

$ kraken list

Quickstart

Recognizing text on an image using the default parameters including the prerequisite steps of binarization and page segmentation:

$ kraken -i image.tif image.txt binarize segment ocr

To binarize a single image using the nlbin algorithm:

$ kraken -i image.tif bw.png binarize

To segment an image (binarized or not) with the new baseline segmenter:

$ kraken -i image.tif lines.json segment -bl

To segment and OCR an image using the default model(s):

$ kraken -i image.tif image.txt segment -bl ocr

All subcommands and options are documented. Use the help option to get more information.

Documentation

Have a look at the docs

Funding

kraken is developed at the École Pratique des Hautes Études, Université PSL.

Comments
  • Training for Devanagari

    Training for Devanagari

    I am trying to build a Devanagari model using kraken. When I use default values for training it works but when I specify training and eval data separately, I get a codec error.

    The following uses the same set of trainingdata.

    This worked:

    ketos train devatrain/*.png > devatrain.log
    
    WARNING: Logging before flag parsing goes to stderr.
    W0218 04:40:21.871934 127598883522176 __init__.py:74] TensorFlow version 1.15.0 detected. Last version known to be fully compatible is 1.14.0 .
    Initializing model ✓
    

    This gets the error:

    ketos -v  train -t devatrain/*.png -e devatrain/*.png -o devatraintest > devatraintest.log
    
    Traceback (most recent call last):
      File "/home/ubuntu/anaconda3/envs/py36/bin/ketos", line 8, in <module>
        sys.exit(cli())
      File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 764, in __call__
        return self.main(*args, **kwargs)
      File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 717, in main
        rv = self.invoke(ctx)
      File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 1135, in invoke
        sub_ctx = cmd.make_context(cmd_name, args, parent=ctx)
      File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 641, in make_context
        self.parse_args(ctx, args)
      File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 940, in parse_args
        value, args = param.handle_parse_result(ctx, opts, args)
      File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 1477, in handle_parse_result
        self.callback, ctx, self, value)
      File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/click/core.py", line 96, in invoke_param_callback
        return callback(ctx, param, value)
      File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/site-packages/kraken/ketos.py", line 63, in _validate_manifests
        for entry in manifest.readlines():
      File "/home/ubuntu/anaconda3/envs/py36/lib/python3.6/codecs.py", line 321, in decode
        (result, consumed) = self._buffer_decode(data, self.errors, final)
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
    
    opened by Shreeshrii 22
  • New training code is eating memory

    New training code is eating memory

    I'm trying to train Japanese OCR model using 2637 images with PAGE xml files. ketos train -d cuda:0 -f page -F 1 -q early --augment -o jpn *.xml

    With 3.0b5 ketos consumes about 2.1GB memory and it is reasonable. With master(patch https://github.com/mittagessen/kraken/pull/199 is needed to run) ketos is eating more than 30GB memory and hangs.

    It seems all images are stored in memory. Is there any option not to do that?

    opened by eighttails 21
  • Docs for the segmentation training format

    Docs for the segmentation training format

    Hey there, I'd like to train a segmenter before leaving next week, and I understood you will release it soon. Any place where I can find the segmenter training format ?

    opened by PonteIneptique 17
  • Training kraken and RTL support?

    Training kraken and RTL support?

    @amitdo commented here on the specific RTL support in kraken. Since I am unsucessfully training OCR models for Hebrew with ocropy, I wonder if kraken could do the job. Can anyone introduce me to the details of kraken's RLT support? I could not find the related information in the documentation. Many thanks in advance!

    opened by wrznr 17
  • Only one model in the repository?

    Only one model in the repository?

    I just installed kraken and kraken list gives only 10.5281/zenodo.2577813 (pytorch) - A generalized model for English printed text. Where are other models? I'm especially interested in Medieval Latin.

    opened by jsbien 16
  • Error when i run kraken segment ocr

    Error when i run kraken segment ocr

    I ran the below command and receive an error as below.

    This happens in both python3.6.8 and python3.7.2

     kraken -i bank.png bank.json segment  --remove_hlines --no-script-detect --scale 20 --pad 100 23 ocr --model en-default.pronn 
    Loading RNN default	✓
    Segmenting	✓
    Traceback (most recent call last):
      File "/home/ram/code/lendsmart/py/kraken/venv/bin/kraken", line 10, in <module>
        sys.exit(cli())
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
        return self.main(*args, **kwargs)
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/click/core.py", line 717, in main
        rv = self.invoke(ctx)
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/click/core.py", line 1164, in invoke
        return _process_result(rv)
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/click/core.py", line 1102, in _process_result
        **ctx.params)
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
        return callback(*args, **kwargs)
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/kraken/kraken.py", line 220, in process_pipeline
        task(base_image=base_image, input=input, output=output)
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/kraken/kraken.py", line 157, in recognizer
        for pred in bar:
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/click/_termui_impl.py", line 285, in generator
        for rv in self.iter:
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/kraken/rpred.py", line 301, in rpred
        preds = network.predict(line)
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/kraken/lib/models.py", line 81, in predict
        o = self.forward(line)
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/kraken/lib/models.py", line 69, in forward
        o = self.nn.nn(line)
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
        input = module(input)
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/kraken/lib/layers.py", line 350, in forward
        o, _ = self.layer(inputs)
      File "/home/ram/code/lendsmart/py/kraken/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
        result = self.forward(*input, **kwargs)
    TypeError: forward() missing 1 required positional argument: 'hidden'
    

    Here is my pip list in venv

    screenshot from 2019-01-28 00-49-37

    Let me know if i missed something.

    Is my level of pips correct ?

    Should i use a different level for torch ?

    opened by kishoreneelamegam 16
  • Bad Credentials

    Bad Credentials

    Hello, I tried to install kraken using pip3 and everything went fine, but I cannot use it. As soon as I try to get default, I have the following error message

    $ kraken get default
    Retrieving model	⣾Traceback (most recent call last):
      File "/usr/local/bin/kraken", line 11, in <module>
        sys.exit(cli())
      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 1092, in invoke
        rv.append(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
        return callback(*args, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
        return f(get_current_context(), *args, **kwargs)
      File "/usr/local/lib/python3.6/site-packages/kraken/kraken.py", line 346, in get
        partial(spin, 'Retrieving model'))
      File "/usr/local/lib/python3.6/site-packages/kraken/repo.py", line 46, in get_model
        raise KrakenRepoException(resp['message'])
    kraken.lib.exceptions.KrakenRepoException: Bad credentials
    

    I did remove kraken and tried with a pip2 install, but the result remains the same :

    $ kraken get default
    Retrieving model	⣾Traceback (most recent call last):
      File "/usr/local/bin/kraken", line 11, in <module>
        sys.exit(cli())
      File "/usr/local/lib/python2.7/site-packages/click/core.py", line 722, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python2.7/site-packages/click/core.py", line 697, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python2.7/site-packages/click/core.py", line 1092, in invoke
        rv.append(sub_ctx.command.invoke(sub_ctx))
      File "/usr/local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
        return callback(*args, **kwargs)
      File "/usr/local/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
        return f(get_current_context(), *args, **kwargs)
      File "/usr/local/lib/python2.7/site-packages/kraken/kraken.py", line 346, in get
        partial(spin, 'Retrieving model'))
      File "/usr/local/lib/python2.7/site-packages/kraken/repo.py", line 46, in get_model
        raise KrakenRepoException(resp['message'])
    kraken.lib.exceptions.KrakenRepoException: Bad credentials
    

    What did I miss ? (I use a macOS 10.12.6, python 2.7 or 3.6)

    opened by loranger 16
  • `Failed processing image.png: ndim`

    `Failed processing image.png: ndim`

    When I run the following command:

    kraken -i OCR17plus/Data/Balzac1624_Lettres_btv1b86262420_corrected/png/Balzac1624_Lettres_btv1b86262420_corrected_0042.png results.txt segment -bl -i OCR17plus/Model/Segment/appenzeller.mlmodel ocr -m OCR17plus/Model/HTR/dentduchat.mlmodel
    

    I have the following issue:

    [37.7261] Failed processing OCR17plus/Data/Balzac1624_Lettres_btv1b86262420_corrected/png/Balzac1624_Lettres_btv1b86262420_corrected_0042.png: ndim 
    

    The data used is available here: https://github.com/Heresta/OCR17plus

    Any idea what the problem could be?

    opened by gabays 15
  • Ketos segtrain applying topline when asking for baseline and vice-versa ?

    Ketos segtrain applying topline when asking for baseline and vice-versa ?

    Hey there,

    I can be wrong, but I have had weird behavior training the segmenter. I believe there is a mistake in the Click definition of the segmenter, but again, I can be wrong...

    In the click, you got -bl/-tl:

    https://github.com/mittagessen/kraken/blob/45f6bb0b1b1632de6077e6712f11d5461ddad63a/kraken/ketos.py#L158-L161

    This value is of baseline (True or False) is passed down

    https://github.com/mittagessen/kraken/blob/45f6bb0b1b1632de6077e6712f11d5461ddad63a/kraken/ketos.py#L260

    to the topline kwarg and can be reused here:

    https://github.com/mittagessen/kraken/blob/abe08ba4bec3778cf9e15c5cea7c0de503358a61/kraken/lib/segmentation.py#L440-L444

    In this excerpt, it shows that topline=True will use topline (which makes sense).

    If you create a file simply containing the same click declaration, such as :

    import click
    
    @click.command() 
    @click.option('-bl/-tl', '--baseline/--topline', show_default=True, 
            default=False, help='Switch for the baseline location in the scripts. ' 
            'Set to topline if the data is annotated with a hanging baseline, as is ' 
            'common with Hebrew, Bengali, Devanagari, etc.') 
    def bl(baseline): 
        print(baseline)
    
    if __name__ == "__main__":
    	bl()
    

    you'll see that -bl gets makes baseline=True which will be directly used as topline=True. On the contrary, using -tl will set it to False. Basically, it seems to be the opposite of what is expected.

    opened by PonteIneptique 15
  • Segmenter proposes creative segmentations

    Segmenter proposes creative segmentations

    Hey, To follow up on https://github.com/mittagessen/kraken/issues/256, I felt like a new issue was in order, as the first was targeted at the inversion of -bl and -tl.

    Before today, training and segmenting would provide a rather good segmentation but strike-through: image

    Since the commits of today, the segmentation is completely all over the place, using the same training set, eval set and command: image

    @gabays has seen the same issue

    opened by PonteIneptique 14
  • Multi-process/thread Dataset building ?

    Multi-process/thread Dataset building ?

    Hi there :) I was wondering if it'd be possible to boost a little the speed of building training set / valid set ? I looked at it and it seems quite "sequential"

    https://github.com/mittagessen/kraken/blob/d39c45564df81d84bea58ee0067a48585a5f63e2/kraken/lib/train.py#L624-L632

    I could take a chance at it if you do not have the time, and if you give me pointer on how you'd like it to be done (proxy function / reusing --threads, etc. )

    opened by PonteIneptique 14
  • install contradiction

    install contradiction

    mamba env create -f environment_cuda.yml --> leaves me with numpy 1.19.5 which in turn gives an error "RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd" my attempts to update numpy failed as pip says : ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. qudida 0.0.4 requires opencv-python-headless>=4.0.1, which is not installed. albumentations 1.3.0 requires opencv-python-headless>=4.1.1, which is not installed. kraken 4.2.1.dev84 requires numpy<=1.23.0, but you have numpy 1.24.1 which is incompatible. coremltools 4.1 requires numpy<1.20,>=1.14.5, but you have numpy 1.24.1 which is incompatible.

    opened by dstoekl 0
  • Windows

    Windows

    Is there any possibility to run it in Windows? Can Ubuntu be installed from Windows Store and then be used?

    Does this model "Kraken:arabPersPrBigMixed_best(1)" ring a bell? If yes, from where to download it? More details in this video in minute 05:21 in the Open ITI project

    Thanks,

    Medo Hamdani

    opened by MedoHamdani 0
  • in training phase ketos train repeats warning for unicode points not in training data.

    in training phase ketos train repeats warning for unicode points not in training data.

    WARNING Non-encodable sequence ︎◻ ךו... encountered. Advancing one code point. codec.py:131 etc

    this is repeated for every epoch

    suggestion: suppress this as the difference of training and testing codec already has a separate yellow warning before launch of training.

    opened by dstoekl 0
  • `ketos train` repeats validation in a loop if early stopping comes too early

    `ketos train` repeats validation in a loop if early stopping comes too early

    The training was started with at least 200 epochs and 20 tries to get a better model:

    ketos train -f page -t list.train -e list.eval -o Juristische_Konsilien_Tuebingen+256 -d cuda:0 --augment --workers 24 -r 0.0001 -B 1 --min-epochs 200 --lag 20 -w 0 -s '[256,64,0,1 Cr4,2,8,4,2 Cr4,2,32,1,1 Mp4,2,4,2 Cr3,3,64,1,1 Mp1,2,1,2 S1(1x0)1,3 Lbx256 Do0.5 Lbx256 Do0.5 Lbx256 Do0.5 Cr255,1,85,1,1]'

    Early stopping would have stopped after stage 111, but training continues because at least 200 was requested. Instead of producing stage 112, 113, 114, ..., it stays at stage 112 and repeats the validation step again and again:

    stage 109/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7366/7366 0:00:00 0:05:14 val_accuracy: 0.87676  early_stopping: 18/20 0.87974
    stage 110/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7366/7366 0:00:00 0:05:18 val_accuracy: 0.87542  early_stopping: 19/20 0.87974
    stage 111/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7366/7366 0:00:00 0:05:18 val_accuracy: 0.87760  early_stopping: 20/20 0.87974
    stage 112/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/7366 -:--:-- 0:00:00  early_stopping: 20/20 0.87974Trainer was signaled to stop but the required `min_epochs=200` or `min_steps=None` has not been met. Training will continue...
    stage 112/∞ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11802/7366 0:00:00 0:08:02 val_accuracy: 0.87345  early_stopping: 20/20 0.87974
    Validation  ━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 223/826    0:00:40 0:00:16                                                     
    
    opened by stweil 6
  • Kraken user-defined metadata keep the base model accuracy scores in the logs

    Kraken user-defined metadata keep the base model accuracy scores in the logs

    Hey, I just discovered that when you fine tune a model, the new model keeps the logs from the base model, such that it has the dev score from the first model in model.user_defined_metadata["kraken_meta"])["accuracy"]

    Is it intended ? :)

    opened by PonteIneptique 0
Releases(4.1.2)
Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

121 Oct 15, 2021
Corner-based Region Proposal Network

Corner-based Region Proposal Network CRPN is a two-stage detection framework for multi-oriented scene text. It employs corners to estimate the possibl

xhzdeng 140 Nov 04, 2022
Document Layout Analysis Projects

Layout_Analysis Introduction This is an implementation of RLSA and X-Y Cut with OpenCV Dependencies OpenCV 3.0+ How to use Compile with g++ : g++ -std

22 Dec 08, 2022
The code for “Oriented RepPoints for Aerail Object Detection”

Oriented RepPoints for Aerial Object Detection The code for the implementation of “Oriented RepPoints”, Under review. (arXiv preprint) Introduction Or

WentongLi 207 Dec 24, 2022
Automatically resolve RidderMaster based on TensorFlow & OpenCV

AutoRiddleMaster Automatically resolve RidderMaster based on TensorFlow & OpenCV 基于 TensorFlow 和 OpenCV 实现的全自动化解御迷士小马谜题 Demo How to use Deploy the ser

神龙章轩 5 Nov 19, 2021
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless. This is the official Roboflow python package that interfaces with the Roboflow API.

Roboflow 52 Dec 23, 2022
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Bridging Video-text Retrieval with Multiple Choice Questions, CVPR 2022 (Oral) Paper | Project Page | Pre-trained Model | CLIP-Initialized Pre-trained

Applied Research Center (ARC), Tencent PCG 99 Jan 06, 2023
Captcha Recognition

The objective of this project is to recognize the target numbers in the captcha images correctly which would tell us how good or bad a captcha system has been built.

Mohit Kaushik 5 Feb 20, 2022
Web interface for browsing arXiv papers

Currently, arxivbox considers only major computer vision and machine learning conferences

Ankan Kumar Bhunia 12 Sep 11, 2022
Introduction to Augmented Reality (AR) with Python 3 and OpenCV 4.2.

Introduction to Augmented Reality (AR) with Python 3 and OpenCV 4.2.

fernanda rodríguez 85 Jan 02, 2023
In this project we will be using the live feed coming from the webcam to create a virtual mouse with complete functionalities.

Virtual Mouse Using OpenCV In this project we will be using the live feed coming from the webcam to create a virtual mouse using hand tracking. Projec

Hassan Shahzad 8 Dec 20, 2022
deployment of a hybrid model for automatic weapon detection/ anomaly detection for surveillance applications

Automatic Weapon Detection Deployment of a hybrid model for automatic weapon detection/ anomaly detection for surveillance applications. Loved the pro

Janhavi 4 Mar 04, 2022
Controlling Volume by Hand Gestures

This program allows the user to control the volume of their device with specific hand gestures involving their thumb and index finger!

Riddhi Bajaj 1 Nov 11, 2021
This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vectors.

Vectorizing color range This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vect

Development Seed 9 Jul 27, 2022
MONAI Label is a server-client system that facilitates interactive medical image annotation by using AI.

MONAI Label is a server-client system that facilitates interactive medical image annotation by using AI. It is an open-source and easy-to-install ecosystem that can run locally on a machine with one

Project MONAI 344 Dec 23, 2022

Installations for running keras-theano on GPU Upgrade pip and install opencv2 cd ~ pip install --upgrade pip pip install opencv-python Upgrade keras

Berat Kurar Barakat 14 Sep 30, 2022
一键翻译各类图片内文字

一键翻译各类图片内文字 针对群内、各个图站上大量不太可能会有人去翻译的图片设计,让我这种日语小白能够勉强看懂图片 主要支持日语,不过也能识别汉语和小写英文 支持简单的涂白和嵌字

574 Dec 28, 2022
pyntcloud is a Python library for working with 3D point clouds.

pyntcloud is a Python library for working with 3D point clouds.

David de la Iglesia Castro 1.2k Jan 07, 2023
A selectional auto-encoder approach for document image binarization

The code of this repository was used for the following publication. If you find this code useful please cite our paper: @article{Gallego2019, title =

Javier Gallego 89 Nov 18, 2022
A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval (arXiv) Repository to contain the code, models, data for end-to-end

225 Dec 25, 2022