Fine tuning keras-ocr python package with custom synthetic dataset from scratch

Overview

OCR-Pipeline-with-Keras

The keras-ocr package generally consists of two parts: a Detector and a Recognizer:

  • Detector is responsible for creating bounding boxes for the words of the text.
  • Recognizer is responsible for processing batch of cropped parts of the initial image.

Keras-ocr connects this two parts into seamless pipeline. "Out of the box", it can handle a wide range of images with texts. But in a specific task, when the field of possible images with texts is greatly narrowed, it shows itself badly in the Recognizer part of the task.

In this regard, the task of fine-tuning Recognizer on a custom dataset was set.


Virtual environment and packages

$ python3 -m venv keras_ocr
$ pip install keras-ocr

And TRDG library for synthetic text generation.

$ pip install trdg

Synthetic data generation

We will use the TRDG library to generate synthetic text. All necessary code presented in the data_generation.py. Things you need to know:

  • You choose template for generating text, e.g. if template is "({}{}/{})", then all brackets will be randomly filled with symbols from alphabet. You need to specify your own instance of StringTemplate classs.

  • You choose the alphabet. In our example case it contains only digits. P.S. Some of the repeated in data_generation.py, hence emperical distribution probability for each symbol defined as fraction of n_repeats to alphabet_size.

  • You can choose your own fonts. To do this, follow instruction:

    1. Download needed fonts as .ttf files
    2. Go to trdg fonts directory ./keras_ocr/lib/python3.8/site-packages/trdg/fonts/
    3. Create directory $ mkdir cs (cs means custom fonts), you can chooce the disered name
    4. Place fonts files in this dir
    5. (For Mac users only) Don't forget to remove .DS_Store from this folder
  • You can chooce image background for text. When creating instance of GeneratorFromStrings in function generate_data_units(...), provide folder with images with arg image_dir

High-level API in the data_generation.py
data_generator = DataGenerator(string_templates=[StringTemplate('{}{}{}{}{}{}{}', 7)])

data_generator.generate(n_patches=20000, n_total_samples=550, path='DigitsBracketsDataset/train')
  • n_patches -- number of different strings from provided template
  • n_total_samples -- number of total samples from patches
  • path -- dir to save samples

Fine tuning Recognizer

Follow instruction in fine_tuning.ipynb. Don't forget to add function get_custom_dataset(...) to datasets.py in keras-ocr package directory (./keras_ocr/lib/python3.8/site-packages/keras_ocr/datasets.py):

def get_custom_dataset(path: str, split: str):
    """
    param: path: path to dataset root dir (include train/test dirs)
    Returns:
        A recognition dataset as a list of (filepath, box, word) tuples
    """
    data = []
    if split == 'train':
        train_dir = os.path.join(path, 'train')
        data.extend(
            _read_born_digital_labels_file(
                labels_filepath=os.path.join(train_dir, "gt.txt"),
                image_folder=train_dir,
            )
        )
    elif split == 'test':
        test_dir = os.path.join(path, 'test')
        data.extend(
            _read_born_digital_labels_file(
                labels_filepath=os.path.join(test_dir, 'gt.txt'), 
                image_folder=test_dir
            )
        )
    return data 
Owner
Eugene
Eugene
Hand Detection and Finger Detection on Live Feed

Hand-Detection-On-Live-Feed Hand Detection and Finger Detection on Live Feed Getting Started Install the dependencies $ git clone https://github.com/c

Chauhan Mahaveer 2 Jan 02, 2022
QED-C: The Quantum Economic Development Consortium provides these computer programs and software for use in the fields of quantum science and engineering.

Application-Oriented Performance Benchmarks for Quantum Computing This repository contains a collection of prototypical application- or algorithm-cent

SRI International 67 Nov 30, 2022
EQFace: An implementation of EQFace: A Simple Explicit Quality Network for Face Recognition

EQFace: A Simple Explicit Quality Network for Face Recognition The first face recognition network that generates explicit face quality online.

DeepCam Shenzhen 141 Dec 31, 2022
Balabobapy - Using artificial intelligence algorithms to continue the text

Balabobapy - Using artificial intelligence algorithms to continue the text

qxtony 1 Feb 04, 2022
An easy to use an (hopefully useful) captcha solution for pyTelegramBotAPI

pyTelegramBotCAPTCHA An easy to use and (hopefully useful) image CAPTCHA soltion for pyTelegramBotAPI. Installation: pip install pyTelegramBotCAPTCHA

29 Dec 26, 2022
Introduction to Augmented Reality (AR) with Python 3 and OpenCV 4.2.

Introduction to Augmented Reality (AR) with Python 3 and OpenCV 4.2.

fernanda rodríguez 85 Jan 02, 2023
Perspective recovery of text using transformed ellipses

unproject_text Perspective recovery of text using transformed ellipses. See full writeup at https://mzucker.github.io/2016/10/11/unprojecting-text-wit

Matt Zucker 111 Nov 13, 2022
Python rubik's cube solver

This program makes a 3D representation of a rubiks cube and solves it step by step.

Pablo QB 4 May 29, 2022
A bot that extract text from images using the Tesseract OCR.

Text from image (OCR) @ocr_text_bot A simple bot to extract text from images. Usage What do I need? A AWS key configured locally, see here. NodeJS. I

Weverton Marques 4 Aug 06, 2021
Image processing using OpenCv

Image processing using OpenCv Write a program that opens the webcam, and the user selects one of the following on the video: ✅ If the user presses the

M.Najafi 4 Feb 18, 2022
A Vietnamese personal card OCR website built with Django.

Django VietCardOCR Installation Creation of virtual environments is done by executing the command venv: python -m venv venv That will create a new fol

Truong Hoang Thuan 4 Sep 04, 2021
A curated list of resources dedicated to scene text localization and recognition

Scene Text Localization & Recognition Resources A curated list of resources dedicated to scene text localization and recognition. Any suggestions and

CarlosTao 1.6k Dec 22, 2022
A Python wrapper for Google Tesseract

Python Tesseract Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and "read" the text embedded i

Matthias A Lee 4.6k Jan 06, 2023
PAGE XML format collection for document image page content and more

PAGE-XML PAGE XML format collection for document image page content and more For an introduction, please see the following publication: http://www.pri

PRImA Research Lab 46 Nov 14, 2022
Table recognition inside douments using neural networks

TableTrainNet A simple project for training and testing table recognition in documents. This project was developed to make a neural network which reco

Giovanni Cavallin 93 Jul 24, 2022
The official code for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates".

SpeechDrivesTemplates The official repo for the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates". [arxiv

Qian Shenhan 53 Dec 23, 2022
Qrcode Attendence System with Opencv and Pyzbar

Setup process Creates a virtual environment (Scripts that ensure executed Python code uses the Python interpreter and site packages installed inside t

Ganesh 5 Aug 01, 2022
Memory tests solver with using OpenCV

Human Benchmark project This project is OpenCV based programs which are puzzle solvers for 7 different games for https://humanbenchmark.com/. made as

Bahadır Araz 24 Dec 27, 2022
ScanTailor Advanced is the version that merges the features of the ScanTailor Featured and ScanTailor Enhanced versions, brings new ones and fixes.

ScanTailor Advanced The ScanTailor version that merges the features of the ScanTailor Featured and ScanTailor Enhanced versions, brings new ones and f

952 Dec 31, 2022
零样本学习测评基准,中文版

ZeroCLUE 零样本学习测评基准,中文版 零样本学习是AI识别方法之一。 简单来说就是识别从未见过的数据类别,即训练的分类器不仅仅能够识别出训练集中已有的数据类别, 还可以对于来自未见过的类别的数据进行区分。 这是一个很有用的功能,使得计算机能够具有知识迁移的能力,并无需任何训练数据, 很符合现

CLUE benchmark 27 Dec 10, 2022