Simple and understandable swin-transformer OCR project

Last update: Dec 31, 2022

Overview

swin-transformer-ocr

Overview

Simple and understandable swin-transformer OCR project. The model in this repository heavily relied on high-level open-source projects like timm and x_transformers. And also you can find that the procedure of training is intuitive thanks to the legibility of pytorch-lightning.

The model in this repository encodes input image to context vector with 'shifted-window` which is a swin-transformer encoding mechanism. And it decodes the vector with a normal auto-regressive transformer.

If you are not familiar with transformer OCR structure, transformer-ocr would be easier to understand because it uses a traditional convolution network (ResNet-v2) for the encoder.

Performance

With private korean handwritten text dataset, the accuracy(exact match) is 97.6%.

Data

./dataset/
├─ preprocessed_image/
│  ├─ cropped_image_0.jpg
│  ├─ cropped_image_1.jpg
│  ├─ ...
├─ train.txt
└─ val.txt

# in train.txt
cropped_image_0.jpg\tHello World.
cropped_image_1.jpg\tvision-transformer-ocr
...

You should preprocess the data first. Crop the image by word or sentence level area. Put all image data in a specific directory. Ground truth information should be provided with a txt file. In the txt file, write the image file name and label with \t separator in the same line.

Configuration

In settings/ directory, you can find default.yaml. You can set almost every hyper-parameter in that file. Copy one and edit it as your experiment version. I recommend you to run with the default setting first, before you change it.

Train

python run.py --version 0 --setting settings/default.yaml --num_workers 16 --batch_size 128

You can check your training log with tensorboard.

tensorboard --log_dir tb_logs --bind_all

Predict

When your model finishes training, you can use your model for prediction.

python predict.py --setting <your_setting.yaml> --target <image_or_directory> --tokenizer <your_tokenizer_pkl> --checkpoint <saved_checkpoint>

Exporting to ONNX

You can export your model to ONNX format. It's very easy thanks to pytorch-lightning. See the related pytorch-lightning document.

Citations

@misc{liu-2021,
    title   = {Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
	author  = {Ze Liu and Yutong Lin and Yue Cao and Han Hu and Yixuan Wei and Zheng Zhang and Stephen Lin and Baining Guo},
	year    = {2021},
    eprint  = {2103.14030},
	archivePrefix = {arXiv}
}

Simple and understandable swin-transformer OCR project

Related tags

Overview

swin-transformer-ocr

Overview

Performance

Data

Configuration

Train

Predict

Exporting to ONNX

Citations

Owner

Ha YongWook

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

Code for unmixing audio signals in four different stems "drums, bass, vocals, others". The code is adapted from "Jukebox: A Generative Model for Music"

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

A fast and easy to use, moddable, Python based Minecraft server!

Implementation for "Exploiting Aliasing for Manga Restoration" (CVPR 2021)

Very Deep Convolutional Networks for Large-Scale Image Recognition

Data and analysis code for an MS on SK VOC genomes phenotyping/neutralisation assays

LSTM and QRNN Language Model Toolkit for PyTorch

Python program that works as a contact list

Model Quantization Benchmark

QQ Browser 2021 AI Algorithm Competition Track 1 1st Place Program

Scribble-Supervised LiDAR Semantic Segmentation, CVPR 2022 (ORAL)

Solution of Kaggle competition: Sartorius - Cell Instance Segmentation

Tensors and neural networks in Haskell

Code for Max-Margin Contrastive Learning - AAAI 2022

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

LSTMs (Long Short Term Memory) RNN for prediction of price trends

Python implementation of NARS (Non-Axiomatic-Reasoning-System)

Fully Convlutional Neural Networks for state-of-the-art time series classification

Official PyTorch implementation of UACANet: Uncertainty Aware Context Attention for Polyp Segmentation