Fine tuning keras-ocr python package with custom synthetic dataset from scratch

Overview

OCR-Pipeline-with-Keras

The keras-ocr package generally consists of two parts: a Detector and a Recognizer:

  • Detector is responsible for creating bounding boxes for the words of the text.
  • Recognizer is responsible for processing batch of cropped parts of the initial image.

Keras-ocr connects this two parts into seamless pipeline. "Out of the box", it can handle a wide range of images with texts. But in a specific task, when the field of possible images with texts is greatly narrowed, it shows itself badly in the Recognizer part of the task.

In this regard, the task of fine-tuning Recognizer on a custom dataset was set.


Virtual environment and packages

$ python3 -m venv keras_ocr
$ pip install keras-ocr

And TRDG library for synthetic text generation.

$ pip install trdg

Synthetic data generation

We will use the TRDG library to generate synthetic text. All necessary code presented in the data_generation.py. Things you need to know:

  • You choose template for generating text, e.g. if template is "({}{}/{})", then all brackets will be randomly filled with symbols from alphabet. You need to specify your own instance of StringTemplate classs.

  • You choose the alphabet. In our example case it contains only digits. P.S. Some of the repeated in data_generation.py, hence emperical distribution probability for each symbol defined as fraction of n_repeats to alphabet_size.

  • You can choose your own fonts. To do this, follow instruction:

    1. Download needed fonts as .ttf files
    2. Go to trdg fonts directory ./keras_ocr/lib/python3.8/site-packages/trdg/fonts/
    3. Create directory $ mkdir cs (cs means custom fonts), you can chooce the disered name
    4. Place fonts files in this dir
    5. (For Mac users only) Don't forget to remove .DS_Store from this folder
  • You can chooce image background for text. When creating instance of GeneratorFromStrings in function generate_data_units(...), provide folder with images with arg image_dir

High-level API in the data_generation.py
data_generator = DataGenerator(string_templates=[StringTemplate('{}{}{}{}{}{}{}', 7)])

data_generator.generate(n_patches=20000, n_total_samples=550, path='DigitsBracketsDataset/train')
  • n_patches -- number of different strings from provided template
  • n_total_samples -- number of total samples from patches
  • path -- dir to save samples

Fine tuning Recognizer

Follow instruction in fine_tuning.ipynb. Don't forget to add function get_custom_dataset(...) to datasets.py in keras-ocr package directory (./keras_ocr/lib/python3.8/site-packages/keras_ocr/datasets.py):

def get_custom_dataset(path: str, split: str):
    """
    param: path: path to dataset root dir (include train/test dirs)
    Returns:
        A recognition dataset as a list of (filepath, box, word) tuples
    """
    data = []
    if split == 'train':
        train_dir = os.path.join(path, 'train')
        data.extend(
            _read_born_digital_labels_file(
                labels_filepath=os.path.join(train_dir, "gt.txt"),
                image_folder=train_dir,
            )
        )
    elif split == 'test':
        test_dir = os.path.join(path, 'test')
        data.extend(
            _read_born_digital_labels_file(
                labels_filepath=os.path.join(test_dir, 'gt.txt'), 
                image_folder=test_dir
            )
        )
    return data 
Owner
Eugene
Eugene
An organized collection of tutorials and projects created for aspriring computer vision students.

A repository created with the purpose of teaching students in BME lab 308A- Hanoi University of Science and Technology

Givralnguyen 5 Nov 24, 2021
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

Multi-Type-TD-TSR Check it out on Source Code of our Paper: Multi-Type-TD-TSR Extracting Tables from Document Images using a Multi-stage Pipeline for

Pascal Fischer 178 Dec 27, 2022
QED-C: The Quantum Economic Development Consortium provides these computer programs and software for use in the fields of quantum science and engineering.

Application-Oriented Performance Benchmarks for Quantum Computing This repository contains a collection of prototypical application- or algorithm-cent

SRI International 67 Nov 30, 2022
Dirty, ugly, and hopefully useful OCR of Facebook Papers docs released by Gizmodo

Quick and Dirty OCR of Facebook Papers Gizmodo has been working through the Facebook Papers and releasing the docs that they process and review. As lu

Bill Fitzgerald 2 Oct 28, 2021
Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and limited )

GTA-5-Lane-detection Just a script for detecting the lanes in any car game (not just gta 5) with specific resolution and road design ( very basic and

Danciu Georgian 4 Aug 01, 2021
Demo processor to illustrate OCR-D Python API

ocrd_vandalize/ Demo processor to illustrate the OCR-D/core Python API Description :TODO: write docs :) Installation From PyPI pip3 install ocrd_vanda

Konstantin Baierer 5 May 05, 2022
A Screen Translator/OCR Translator made by using Python and Tesseract, the user interface are made using Tkinter. All code written in python.

About An OCR translator tool. Made by me by utilizing Tesseract, compiled to .exe using pyinstaller. I made this program to learn more about python. I

Fauzan F A 41 Dec 30, 2022
Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'

SSTDNet Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight' using pytorch. This code is work for general object detecti

HotaekHan 84 Jan 05, 2022
pyntcloud is a Python library for working with 3D point clouds.

pyntcloud is a Python library for working with 3D point clouds.

David de la Iglesia Castro 1.2k Jan 07, 2023
scantailor - Scan Tailor is an interactive post-processing tool for scanned pages.

Scan Tailor - scantailor.org This project is no longer maintained, and has not been maintained for a while. About Scan Tailor is an interactive post-p

1.5k Dec 28, 2022
PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Description This is a PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector. Only RBOX part is implemented. Using dice loss

365 Dec 20, 2022
Toolbox for OCR post-correction

Ochre Ochre is a toolbox for OCR post-correction. Please note that this software is experimental and very much a work in progress! Overview of OCR pos

National Library of the Netherlands / Research 117 Nov 10, 2022
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless. This is the official Roboflow python package that interfaces with the Roboflow API.

Roboflow 52 Dec 23, 2022
This is a GUI program which consist of 4 OpenCV projects

Tkinter-OpenCV Project Using Tkinter, Opencv, Mediapipe This is a python GUI program using Tkinter which consist of 4 OpenCV projects 1. Finger Counte

Arya Bagde 3 Feb 22, 2022
A simple python program to record security cam footage by detecting a face and body of a person in the frame.

SecurityCam A simple python program to record security cam footage by detecting a face and body of a person in the frame. This code was created by me,

1 Nov 08, 2021
Pytorch implementation of PSEnet with Pyramid Attention Network as feature extractor

Scene Text-Spotting based on PSEnet+CRNN Pytorch implementation of an end to end Text-Spotter with a PSEnet text detector and CRNN text recognizer. We

azhar shaikh 62 Oct 10, 2022
🔎 Like Chardet. 🚀 Package for encoding & language detection. Charset detection.

Charset Detection, for Everyone 👋 The Real First Universal Charset Detector A library that helps you read text from an unknown charset encoding. Moti

TAHRI Ahmed R. 332 Dec 31, 2022
Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

sushant097 224 Jan 07, 2023
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. ocrmypdf # it's a scriptable c

jbarlow83 7.9k Jan 03, 2023
A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval (arXiv) Repository to contain the code, models, data for end-to-end

225 Dec 25, 2022