The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Last update: Dec 26, 2022

Related tags

Text Data & NLP DocTr

Overview

Good news! Our new work exhibits state-of-the-art performances on DocUNet benchmark dataset: DocScanner: Robust Document Image Rectification with Progressive Learning

DocTr

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction
ACM MM 2021 Oral

Any questions or discussions are welcomed!

Training

For geometric unwarping, we train the GeoTr network using the Doc3d dataset.
For illumination correction, we train the IllTr network based on the DRIC dataset.

Inference

Download the pretrained models here and put them to $ROOT/model_pretrained/.
Geometric unwarping:
```
python inference.py
```
Geometric unwarping and illumination rectification:
```
python inference.py --ill_rec True
```

Evaluation

We use the same evaluation code as DocUNet benchmark dataset based on Matlab 2019a.
Please compare the scores according to your Matlab version.
Use the images available here for reproducing the quantitative performance reported in the paper and further comparison.

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{feng2021doctr,
  title={DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction},
  author={Feng, Hao and Wang, Yuechen and Zhou, Wengang and Deng, Jiajun and Li, Houqiang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  pages={273--281},
  year={2021}
}

@article{feng2021docscanner,
  title={DocScanner: Robust Document Image Rectification with Progressive Learning},
  author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Tian, Qi and Li, Houqiang},
  journal={arXiv preprint arXiv:2110.14968},
  year={2021}
}

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Related tags

Overview

DocTr

Training

Inference

Evaluation

Citation

Owner

Hao Feng

pytorch implementation of Attention is all you need

Rethinking the Truly Unsupervised Image-to-Image Translation - Official PyTorch Implementation (ICCV 2021)

Smart discord chatbot integrated with Dialogflow

A 10000+ hours dataset for Chinese speech recognition

MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data.

NLP techniques such as named entity recognition, sentiment analysis, topic modeling, text classification with Python to predict sentiment and rating of drug from user reviews.

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Just a basic Telegram AI chat bot written in Python using Pyrogram.

Library for fast text representation and classification.

Korean Simple Contrastive Learning of Sentence Embeddings using SKT KoBERT and kakaobrain KorNLU dataset

Beyond the Imitation Game collaborative benchmark for enormous language models

A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Big Bird: Transformers for Longer Sequences

SimCSE: Simple Contrastive Learning of Sentence Embeddings

A fast and lightweight python-based CTC beam search decoder for speech recognition.

中文問句產生器；使用台達電閱讀理解資料集(DRCD)

[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021

Automatically search Stack Overflow for the command you want to run

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production