Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

[email protected]">

Last update: Jan 01, 2023

Related tags

Computer Vision DewarpNet

Overview

DewarpNet

This repository contains the codes for DewarpNet training.

Recent Updates

[May, 2020] Added evaluation images and an important note about Matlab SSIM.
[Dec, 2020] Added OCR evaluation details.

Training

Prepare Data: train.txt & val.txt. Contents should be like:

1/824_8-cp_Page_0503-7Ns0001
1/824_1-cp_Page_0504-2Cw0001

Train Shape Network: python trainwc.py --arch unetnc --data_path ./data/DewarpNet/doc3d/ --batch_size 50 --tboard
Train Texture Mapping Network: python trainbm.py --arch dnetccnl --img_rows 128 --img_cols 128 --img_norm --n_epoch 250 --batch_size 50 --l_rate 0.0001 --tboard --data_path ./DewarpNet/doc3d

Inference:

Run: python infer.py --wc_model_path ./eval/models/unetnc_doc3d.pkl --bm_model_path ./eval/models/dnetccnl_doc3d.pkl --show

Evaluation (Image Metrics):

We use the same evaluation code as DocUNet. To reproduce the quantitative results reported in the paper use the images available here.
[Important note about Matlab version] We noticed that Matlab 2020a uses a different SSIM implementation which gives a better MS-SSIM score (0.5623). Whereas we have used Matlab 2018b. Please compare the scores according to your Matlab version.

Evaluation (OCR Metrics):

The 25 images used for OCR evaluation is /eval/ocr_eval/ocr_files.txt
The corresponding ground-truth text is given in /eval/ocr_eval/tess_gt.json
For the OCR errors reported in the paper we had used cv2.blur as pre-processing which gives higher error in all the cases. For convenience, we provide the updated numbers (without using blur) in the following table:

Method	ED	CER	ED (no blur)	CER (no blur)
DocUNet	1975.86	0.4656(0.263)	1671.80	0.403 (0.256)
DocUNet on Doc3D	1684.34	0.3955 (0.272)	1296.00	0.294 (0.235)
DewarpNet	1288.60	0.3136 (0.248)	1007.28	0.249 (0.236)
DewarpNet (ref)	1114.40	0.2692 (0.234)	812.48	0.204 (0.228)

We had used the Tesseract (v4.1.0) default configuration for evaluation with PyTesseract (v0.2.6).

Models:

Pre-trained models are available here. These models are captured prior to end-to-end training, thus won't give you the end-to-end results reported in Table 2 of the paper. Use the images provided above to get the exact numbers as Table 2.

Dataset:

The doc3D dataset can be downloaded using the scripts here.

More Stuff:

Citation:

If you use the dataset or this code, please consider citing our work-

@inproceedings{SagnikKeICCV2019, 
Author = {Sagnik Das*, Ke Ma*, Zhixin Shu, Dimitris Samaras, Roy Shilkrot}, 
Booktitle = {Proceedings of International Conference on Computer Vision}, 
Title = {DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks}, 
Year = {2019}}

Acknowledgements:

These codes are heavily structured on pytorch-semseg.

Code for the paper "DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks" (ICCV '19)

Related tags

Overview

DewarpNet

Recent Updates

Training

Inference:

Evaluation (Image Metrics):

Evaluation (OCR Metrics):

Models:

Dataset:

More Stuff:

Citation:

Acknowledgements:

Owner

[email protected]

kaldi-asr/kaldi is the official location of the Kaldi project.

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Python-based tools for document analysis and OCR

Page to PAGE Layout Analysis Tool

Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

Virtual Zoom Gesture using OpenCV

PyTorch Re-Implementation of EAST: An Efficient and Accurate Scene Text Detector

Learn computer graphics by writing GPU shaders!

Resizing Canny Countour In Python

Ackermann Line Follower Robot Simulation.

Table Extraction Tool

EAST for ICPR MTWI 2018 Challenge II (Text detection of network images)

A fastai/PyTorch package for unpaired image-to-image translation.

SRA's seminar on Introduction to Computer Vision Fundamentals

A dataset handling library for computer vision datasets in LOST-fromat

Fully-automated scripts for collecting AI-related papers

PAGE XML format collection for document image page content and more

Controlling Volume by Hand Gestures

An Optical Character Recognition system using Pytesseract/Extracting data from Blood Pressure Reports.

Fun program to overlay a mask to yourself using a webcam