Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

Overview

Scene Text Recognition Recommendations


Everythin about Scene Text Recognition

SOTA Papers Datasets Code

Contents

1.Papers

All Papers Can be Find Here

  • Latest Papers:
up to (2021-12-8)
up to (2021-12-3)
up to (2021-11-25)

2.Datasets

2.1 Synthetic Datasets

Dataset Description Examples BaiduNetdisk link
SynthText 9 million synthetic text instance images from a set of 90k common English words. Words are rendered onto nartural images with random transformations SynthText Scene text datasets(提取码:emco)
MJSynth 6 million synthetic text instances. It's a generation of SynthText. MJText Scene text datasets(提取码:emco)

2.2 Benchmarks

Dataset Description Examples BaiduNetdisk link
IIIT5k-Words(IIIT5K) 3000 test images instances. Take from street scenes and from originally-digital images IIIT5K Scene text datasets(提取码:emco)
Street View Text(SVT) 647 test images instances. Some images are severely corrupted by noise, blur, and low resolution SVT Scene text datasets(提取码:emco)
StreetViewText-Perspective(SVT-P) 639 test images instances. It is specifically designed to evaluate perspective distorted textrecognition. It is built based on the original SVT dataset by selecting the images at the sameaddress on Google Street View but with different view angles. Therefore, most text instancesare heavily distorted by the non-frontal view angle. SVTP Scene text datasets(提取码:emco)
ICDAR 2003(IC03) 867 test image instances IC03 Scene text datasets(提取码:mfir)
ICDAR 2013(IC13) 1015 test images instances IC13 Scene text datasets(提取码:emco)
ICDAR 2015(IC15) 2077 test images instances. As text images were taken by Google Glasses without ensuringthe image quality, most of the text is very small, blurred, and multi-oriented IC15 Scene text datasets(提取码:emco)
CUTE80(CUTE) 288 It focuses on curved text recognition. Most images in CUTE have acomplex background, perspective distortion, and poor resolution CUTE Scene text datasets(提取码:emco)

3.1 Public Code

3.1. Frameworks

PaddleOCR (百度)

  • PaddlePaddle/PaddleOCR
  • 特性 (截取至PaddleOCR):
    • 使用百度自研深度学习框架PaddlePaddle搭建
    • PP-OCR系列高质量预训练模型,准确的识别效果
      • 超轻量PP-OCRv2系列:检测(3.1M)+ 方向分类器(1.4M)+ 识别(8.5M)= 13.0M
      • 超轻量PP-OCR mobile移动端系列:检测(3.0M)+方向分类器(1.4M)+ 识别(5.0M)= 9.4M
      • 通用PPOCR server系列:检测(47.1M)+方向分类器(1.4M)+ 识别(94.9M)= 143.4M
      • 支持中英文数字组合识别、竖排文本识别、长文本识别
      • 支持多语言识别:韩语、日语、德语、法语
      • 丰富易用的OCR相关工具组件
    • 半自动数据标注工具PPOCRLabel:支持快速高效的数据标注
      • 数据合成工具Style-Text:批量合成大量与目标场景类似的图像
      • 文档分析能力PP-Structure:版面分析与表格识别
      • 支持用户自定义训练,提供丰富的预测推理部署方案
      • 支持PIP快速安装使用
      • 可运行于Linux、Windows、MacOS等多种系统
  • 支持算法(识别):
    • CRNN
    • Rosetta
    • STAR-Net
    • RARE
    • SRN
    • NRTR

MMOCR (商汤)

  • open-mmlab/mmocr
  • 特性(截取至MMOCR):
    • MMOCR 是基于 PyTorchmmdetection 的开源工具箱,专注于文本检测,文本识别以及相应的下游任务,如关键信息提取。 它是 OpenMMLab 项目的一部分。
    • 该工具箱不仅支持文本检测和文本识别,还支持其下游任务,例如关键信息提取。
  • 支持算法(识别)
    • CRNN (TPAMI'2016)
    • NRTR (ICDAR'2019)
    • RobustScanner (ECCV'2020)
    • SAR (AAAI'2019)
    • SATRN (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era)
    • SegOCR (Manuscript'2021)

Deep Text Recognition Benchmark (ClovaAI)


3.2. Algorithms

CRNN


ASTER

  • Tensorflow, official, 651 : bgshih/aster
    • 官方实现版本,使用Tensorflow
  • Pytorch, 535 :ayumuymk/aster.pytorch
    • Pytorch版本,准确率相较原文有明显提升

MORANv2

  • Pytorch, official, 572 :Canjie-Luo/MORAN_v2
    • MORAN v2版本。更加稳定的单阶段训练,更换ResNet做backbone,使用双向解码器

4.SOTA

Regular Dataset Irregular  dataset
Model Year IIIT SVT IC13(857) IC13(1015) IC15(1811) IC15(2077) SVTP CUTE
CRNN  2015 78.2 80.8 - 86.7 - - - -
ASTER(L2R)  2015 92.67 91.16 - 90.74 76.1 - 78.76 76.39
CombBest  2019 87.9 87.5 93.6 92.3 77.6 71.8 79.2 74
ESIR 2019 93.3 90.2 - 91.3 - 76.9 79.6 83.3
SE-ASTER  2020 93.8 89.6 - 92.8 80 81.4 83.6
DAN  2020 94.3 89.2 - 93.9 - 74.5 80 84.4
RobustScanner 2020 95.3 88.1 - 94.8 - 77.1 79.5 90.3
AutoSTR  2020 94.7 90.9 - 94.2 81.8 - 81.7 -
Yang et al.  2020 94.7 88.9 - 93.2 79.5 77.1 80.9 85.4
SATRN  2020 92.8 91.3 - 94.1 - 79 86.5 87.8
SRN  2020 94.8 91.5 95.5 - 82.7 - 85.1 87.8
GA-SPIN  2021 95.2 90.9 - 94.8 82.8 79.5 83.2 87.5
PREN2D  2021 95.6 94 96.4 - 83 - 87.6 91.7
Bhunia et al.  2021 95.2 92.2 - 95.5 - 84 85.7 89.7
VisionLAN  2021 95.8 91.7 95.7 - 83.7 - 86 88.5
ABINet  2021 96.2 93.5 97.4 - 86.0 - 89.3 89.2
MATRN 2021 96.7 94.9 97.9 95.8 86.6 82.9 90.5 94.1

Baek's Reimplementation Version

img

Owner
Deep Learning and Vision Computing Lab, SCUT
Deep Learning and Vision Computing Lab, SCUT
Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

CV-Virtual-WhiteBoard The Virtual WhiteBoard is a project I made using the OpenCV and Mediapipe Python libraries. Using your index and middle finger y

Stephen Wang 1 Jan 07, 2022
CRAFT-Pyotorch:Character Region Awareness for Text Detection Reimplementation for Pytorch

CRAFT-Reimplementation Note:If you have any problems, please comment. Or you can join us weChat group. The QR code will update in issues #49 . Reimple

453 Dec 28, 2022
Text layer for bio-image annotation.

napari-text-layer Napari text layer for bio-image annotation. Installation You can install using pip: pip install napari-text-layer Keybindings and m

6 Sep 29, 2022
Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

hocr-tools About About the code Installation System-wide with pip System-wide from source virtualenv Available Programs hocr-check -- check the hOCR f

OCRopus 285 Dec 08, 2022
This is a real life mario project using python and mediapipe

real-life-mario This is a real life mario project using python and mediapipe How to run to run this just run - realMario.py file requirements This req

Programminghut 42 Dec 22, 2022
Awesome Spectral Indices in Python.

Awesome Spectral Indices in Python: Numpy | Pandas | GeoPandas | Xarray | Earth Engine | Planetary Computer | Dask GitHub: https://github.com/davemlz/

David Montero Loaiza 98 Jan 02, 2023
Rubik's Cube in pygame with OpenGL

Rubik Rubik's Cube in pygame with OpenGL The script show on the screen a Rubik Cube buit with OpenGL. Then I have also implemented all the possible mo

Gabro 2 Apr 15, 2022
2 telegram-bots: for image recognition and for text generation

💻 📱 Telegram_Bots 🔎 & 📖 2 telegram-bots: for image recognition and for text generation. About Image recognition bot: User sends a photo and bot de

Marina Polukoshko 1 Jan 27, 2022

Installations for running keras-theano on GPU Upgrade pip and install opencv2 cd ~ pip install --upgrade pip pip install opencv-python Upgrade keras

Berat Kurar Barakat 14 Sep 30, 2022
OpenCV-Erlang/Elixir bindings

evision [WIP] : OS : arch Build Status Ubuntu 20.04 arm64 Ubuntu 20.04 armv7 Ubuntu 20.04 s390x Ubuntu 20.04 ppc64le Ubuntu 20.04 x86_64 macOS 11 Big

Cocoa 194 Jan 05, 2023
Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

Products Recognition 介绍 商品识别,围绕在复杂的商场零售场景中,识别出货架图像中的商品信息。主要组成部分: 重复图像检测。【更新进度 4/10】 图像拼接。【更新进度 0/10】 目标检测。【更新进度 0/10】 商品识别。【更新进度 1/10】 OCR。【更新进度 1/10】

zhenjieWang 18 Jan 27, 2022
【Auto】原神⭐钓鱼辅助工具 | 自动收竿、校准游标 | ✨您只需要抛出鱼竿,我们会帮你完成一切✨

原神钓鱼辅助工具 ✨ 作者正在努力重构代码中……会尽快带给大家一个更完美的脚本 ✨ 「您只需抛出鱼竿,然后我们会帮您搞定一切」 如果你觉得这个脚本好用,请点一个 Star ⭐ ,你的 Star 就是作者更新最大的动力 点击这里 查看演示视频 ✨ 欢迎大家在 Issues 中分享自己的配置文件 ✨ ✨

261 Jan 02, 2023
Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

Streaming speaker diarization Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation by Juan Manuel Coria, Hervé

Juanma Coria 185 Jan 01, 2023
The code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Long-term Action Assessment".

Likert Scoring with Grade Decoupling for Long-term Action Assessment This is the code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Lon

10 Oct 21, 2022
Creating of virtual elements of the graphical interface using opencv and mediapipe.

Virtual GUI Creating of virtual elements of the graphical interface using opencv and mediapipe. Element GUI Output Description Button By default the b

Aleksei 4 Jun 16, 2022
Page to PAGE Layout Analysis Tool

P2PaLA Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks. 💥 Try our new DEMO for online baseli

Lorenzo Quirós Díaz 180 Nov 24, 2022
QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021)

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 119 Dec 02, 2022
EAST for ICPR MTWI 2018 Challenge II (Text detection of network images)

EAST_ICPR2018: EAST for ICPR MTWI 2018 Challenge II (Text detection of network images) Introduction This is a repository forked from argman/EAST for t

QichaoWu 49 Dec 24, 2022
A simple OCR API server, seriously easy to be deployed by Docker, on Heroku as well

ocrserver Simple OCR server, as a small working sample for gosseract. Try now here https://ocr-example.herokuapp.com/, and deploy your own now. Deploy

Hiromu OCHIAI 541 Dec 28, 2022
Python tool that takes the OCR.space JSON output as input and draws a text overlay on top of the image.

OCR.space OCR Result Checker = Draw OCR overlay on top of image Python tool that takes the OCR.space JSON output as input, and draws an overlay on to

a9t9 4 Oct 18, 2022