ocroseg - This is a deep learning model for page layout analysis / segmentation.

Overview

ocroseg

This is a deep learning model for page layout analysis / segmentation.

There are many different ways in which you can train and run it, but by default, it will simply return the text lines in a page image.

Segmentation

Segmentation is carried out using the ocroseg.Segmenter class. This needs a model that you can download or train yourself.

%%bash
model=lowskew-000000259-011440.pt
test -f $model || wget --quiet -nd https://storage.googleapis.com/tmb-models/$model
%pylab inline
rc("image", cmap="gray", interpolation="bicubic")
figsize(10, 10)
Populating the interactive namespace from numpy and matplotlib

The Segmenter object handles page segmentation using a DL model.

import ocroseg
seg = ocroseg.Segmenter("lowskew-000000259-011440.pt")
seg.model
Sequential(
  (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True)
  (2): ReLU()
  (3): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (4): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
  (6): ReLU()
  (7): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (8): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (9): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
  (10): ReLU()
  (11): LSTM2(
    (hlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
    (vlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
  )
  (12): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1))
  (13): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True)
  (14): ReLU()
  (15): LSTM2(
    (hlstm): RowwiseLSTM(
      (lstm): LSTM(32, 32, bidirectional=1)
    )
    (vlstm): RowwiseLSTM(
      (lstm): LSTM(64, 32, bidirectional=1)
    )
  )
  (16): Conv2d(64, 1, kernel_size=(1, 1), stride=(1, 1))
  (17): Sigmoid()
)

Let's segment a page with this.

image = 1.0 - imread("testdata/W1P0.png")[:2000]
print image.shape
imshow(image)
(2000, 2592)





<matplotlib.image.AxesImage at 0x7f6078b09690>

png

The extract_textlines method returns a list of text line images, bounding boxes, etc.

lines = seg.extract_textlines(image)
imshow(lines[0]['image'])
<matplotlib.image.AxesImage at 0x7f60781c05d0>

png

The segmenter accomplishes this by predicting seeds for each text line. With a bit of mathematical morphology, these seeds are then extended into a text line segmentation.

imshow(seg.lines)
<matplotlib.image.AxesImage at 0x7f60781a5510>

png

Training

The text line segmenter is trained using pairs of page images and line images stored in tar files.

%%bash
tar -ztvf testdata/framedlines.tgz | sed 6q
-rw-rw-r-- tmb/tmb      110404 2017-03-19 16:47 A001BIN.framed.png
-rw-rw-r-- tmb/tmb       10985 2017-03-16 16:15 A001BIN.lines.png
-rw-rw-r-- tmb/tmb       74671 2017-03-19 16:47 A002BIN.framed.png
-rw-rw-r-- tmb/tmb        8528 2017-03-16 16:15 A002BIN.lines.png
-rw-rw-r-- tmb/tmb      147716 2017-03-19 16:47 A003BIN.framed.png
-rw-rw-r-- tmb/tmb       12023 2017-03-16 16:15 A003BIN.lines.png


tar: write error
from dlinputs import tarrecords
sample = tarrecords.tariterator(open("testdata/framedlines.tgz")).next()
subplot(121); imshow(sample["framed.png"])
subplot(122); imshow(sample["lines.png"])
<matplotlib.image.AxesImage at 0x7f60e3d9bc10>

png

There are also some tools for data augmentation.

Generally, you can train these kinds of segmenters on any kind of image data, though they work best on properly binarized, rotation and skew-normalized page images. Note that by conventions, pages are white on black. You need to make sure that the model you load matches the kinds of pages you are trying to segment.

The actual models used are pretty complex and require LSTMs to function well, but for demonstration purposes, let's define and use a tiny layout analysis model. Look in bigmodel.py for a realistic model.

%%writefile tinymodel.py
def make_model():
    r = 3
    model = nn.Sequential(
        nn.Conv2d(1, 8, r, padding=r//2),
        nn.ReLU(),
        nn.MaxPool2d(2, 2),
        nn.Conv2d(8, 1, r, padding=r//2),
        nn.Sigmoid()
    )
    return model
Writing tinymodel.py
%%bash
./ocroseg-train -d testdata/framedlines.tgz --maxtrain 10 -M tinymodel.py --display 0
raw sample:
__key__ 'A001BIN'
__source__ 'testdata/framedlines.tgz'
lines.png float32 (3300, 2592)
png float32 (3300, 2592)

preprocessed sample:
__key__ <type 'list'> ['A002BIN']
__source__ <type 'list'> ['testdata/framedlines.tgz']
input float32 (1, 3300, 2592, 1)
mask float32 (1, 3300, 2592, 1)
output float32 (1, 3300, 2592, 1)

ntrain 0
model:
Sequential(
  (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU()
  (2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (3): Conv2d(8, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (4): Sigmoid()
)

0 0 ['A006BIN'] 0.24655306 ['A006BIN'] 0.31490618 0.55315816 lr 0.03
1 1 ['A007BIN'] 0.24404158 ['A007BIN'] 0.30752876 0.54983306 lr 0.03
2 2 ['A004BIN'] 0.24024434 ['A004BIN'] 0.31007746 0.54046077 lr 0.03
3 3 ['A008BIN'] 0.23756175 ['A008BIN'] 0.30573484 0.5392694 lr 0.03
4 4 ['A00LBIN'] 0.22300518 ['A00LBIN'] 0.28594157 0.52989864 lr 0.03
5 5 ['A00MBIN'] 0.22032338 ['A00MBIN'] 0.28086954 0.52204597 lr 0.03
6 6 ['A00DBIN'] 0.22794804 ['A00DBIN'] 0.27466372 0.512208 lr 0.03
7 7 ['A009BIN'] 0.22404794 ['A009BIN'] 0.27621177 0.51116604 lr 0.03
8 8 ['A001BIN'] 0.22008553 ['A001BIN'] 0.27836022 0.5008192 lr 0.03
9 9 ['A00IBIN'] 0.21842314 ['A00IBIN'] 0.26755702 0.4992323 lr 0.03
Owner
NVIDIA Research Projects
NVIDIA Research Projects
Some codes from PyImageSearch course's and external projects.

👨‍💻 Some codes and projects 👨‍💻 💡 Technologies 📜 Projects 📍 Chrome Dinosaur Controller 📦 Script 📍 Coins Counter 📦 Script 🤓 Author Lucas Biv

Lucas Bivar 25 Oct 24, 2021
M-LSDを用いて四角形を検出し、射影変換を行うサンプルプログラム

M-LSD-warpPerspective-Example M-LSDを用いて四角形を検出し、射影変換を行うサンプルプログラムです。 Requirements OpenCV 3.4.2 or Later tensorflow 2.4.1 or Later Usage 実行方法は以下です。 pytho

KazuhitoTakahashi 9 Oct 14, 2022
Line based ATR Engine based on OCRopy

OCR Engine based on OCRopy and Kraken using python3. It is designed to both be easy to use from the command line but also be modular to be integrated

948 Dec 23, 2022
OCR, Scene-Text-Understanding, Text Recognition

Scene-Text-Understanding Survey [2015-PAMI] Text Detection and Recognition in Imagery: A Survey paper [2014-Front.Comput.Sci] Scene Text Detection and

Alan Tang 354 Dec 12, 2022
A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.

The project is based on older versions of tesseract and other tools, and is now superseded by another project which allows for more granular control o

Maxim 32 Jul 24, 2022
aardio的opencv库

opencv_aardio dll库下载地址:https://github.com/xuncv/opencv-plugin/releases import cv2 img = cv2.imread("./images/Lena.jpg",1) img = cv2.medianBlur(img,5)

71 Dec 31, 2022
一键翻译各类图片内文字

一键翻译各类图片内文字 针对群内、各个图站上大量不太可能会有人去翻译的图片设计,让我这种日语小白能够勉强看懂图片 主要支持日语,不过也能识别汉语和小写英文 支持简单的涂白和嵌字

574 Dec 28, 2022
Document Layout Analysis Projects

Layout_Analysis Introduction This is an implementation of RLSA and X-Y Cut with OpenCV Dependencies OpenCV 3.0+ How to use Compile with g++ : g++ -std

22 Dec 08, 2022
Application that instantly translates sign-language to letters.

Sign Language Translator Project Description The main purpose of project is translating sign-language to letters. In accordance with this purpose we d

3 Sep 29, 2022
SemTorch

SemTorch This repository contains different deep learning architectures definitions that can be applied to image segmentation. All the architectures a

David Lacalle Castillo 154 Dec 07, 2022
YOLOv5 in DOTA with CSL_label.(Oriented Object Detection)(Rotation Detection)(Rotated BBox)

YOLOv5_DOTA_OBB YOLOv5 in DOTA_OBB dataset with CSL_label.(Oriented Object Detection) Datasets and pretrained checkpoint Datasets : DOTA Pretrained Ch

1.1k Dec 30, 2022
computer vision, image processing and machine learning on the web browser or node.

Image processing and Machine learning labs   computer vision, image processing and machine learning on the web browser or node note Fast Fourier Trans

ryohei tanaka 487 Nov 11, 2022
Thresholding-and-masking-using-OpenCV - Image Thresholding is used for image segmentation

Image Thresholding is used for image segmentation. From a grayscale image, thresholding can be used to create binary images. In thresholding we pick a threshold T.

Grace Ugochi Nneji 3 Feb 15, 2022
How to detect objects in real time by using Jupyter Notebook and Neural Networks , by using Yolo3

Real Time Object Recognition From your Screen Desktop . In this post, I will explain how to build a simply program to detect objects from you desktop

Ruslan Magana Vsevolodovna 2 Sep 28, 2022
Random maze generator and solver

Maze Generator and Solver I wrote a maze generator that works with two commonly known algorithms: Depth First Search and Randomized Prims. Both of the

Daniel Pérez 10 Sep 23, 2022
Vietnamese Language Detection and Recognition

Table of Content Introduction (Khôi viết) Dataset (đổi link thui thành 3k5 ảnh mình) Getting Started (An Viết) Requirements Usage Example Training & E

6 May 27, 2022
Python rubik's cube solver

This program makes a 3D representation of a rubiks cube and solves it step by step.

Pablo QB 4 May 29, 2022
APS 6º Semestre - UNIP (2021)

UNIP - Universidade Paulista Ciência da Computação (CC) DESENVOLVIMENTO DE UM SISTEMA COMPUTACIONAL PARA ANÁLISE E CLASSIFICAÇÃO DE FORMAS Link do git

Eduardo Talarico 5 Mar 09, 2022
A curated list of resources dedicated to scene text localization and recognition

Scene Text Localization & Recognition Resources A curated list of resources dedicated to scene text localization and recognition. Any suggestions and

CarlosTao 1.6k Dec 22, 2022