Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Last update: Dec 28, 2022

Overview

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop)

(Pronounced as "strog")

Paper

Arxiv

Why it matters?

Scene Text Recognition (STR) requires data augmentation functions that are different from object recognition. STRAug is data augmentation designed for STR. It offers 36 data augmentation functions that are sorted into 8 groups. Each function supports 3 levels or magnitudes of severity or intensity.

Given a source image:

it can be transformed as follows:

warp.py - to generate Curve, Distort, Stretch (or Elastic) deformations

`Curve`	`Distort`	`Stretch`

geometry.py - to generate Perspective, Rotation, Shrink deformations

`Perspective`	`Rotation`	`Shrink`

pattern.py - to create different grids: Grid, VGrid, HGrid, RectGrid, EllipseGrid

`Grid`	`VGrid`	`HGrid`	`RectGrid`	`EllipseGrid`

blur.py - to generate synthetic blur: GaussianBlur, DefocusBlur, MotionBlur, GlassBlur, ZoomBlur

`GaussianBlur`	`DefocusBlur`	`MotionBlur`	`GlassBlur`	`ZoomBlur`

noise.py - to add noise: GaussianNoise, ShotNoise, ImpulseNoise, SpeckleNoise

`GaussianNoise`	`ShotNoise`	`ImpulseNoise`	`SpeckleNoise`

weather.py - to simulate certain weather conditions: Fog, Snow, Frost, Rain, Shadow

`Fog`	`Snow`	`Frost`	`Rain`	`Shadow`

camera.py - to simulate camera sensor tuning and image compression/resizing: Contrast, Brightness, JpegCompression, Pixelate

`Contrast`	`Brightness`	`JpegCompression`	`Pixelate`

process.py - all other image processing issues: Posterize, Solarize, Invert, Equalize, AutoContrast, Sharpness, Color

`Posterize`	`Solarize`	`Invert`	`Equalize`

`AutoContrast`	`Sharpness`	`Color`

Pip install

pip3 install straug

How to use

Command line (e.g. input image is nokia.png):

>>> from straug.warp import Curve
>>> from PIL import Image
>>> img = Image.open("nokia.png")
>>> img = Curve()(img, mag=3)
>>> img.save("curved_nokia.png")

Python script (see test.py):

python3 test.py --image=<target image>

For example:

python3 test.py --image=images/telekom.png

The corrupted images are in results directory.

Reference

Image corruptions (eg blur, noise, camera effects, fog, frost, etc) are based on the work of Hendrycks et al.

Citation

If you find this work useful, please cite:

@inproceedings{atienza2021data,
  title={Data Augmentation for Scene Text Recognition},
  author={Atienza, Rowel},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)},
  year={2021},
  pubstate={published},
  tppubtype={inproceedings}
}

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Related tags

Overview

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop)

Paper

Why it matters?

Pip install

How to use

Reference

Citation

Owner

Rowel Atienza

COVINS -- A Framework for Collaborative Visual-Inertial SLAM and Multi-Agent 3D Mapping

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Convolutional Neural Network for Text Classification in Tensorflow

Efficient Lottery Ticket Finding: Less Data is More

A high-performance anchor-free YOLO. Exceeding yolov3~v5 with ONNX, TensorRT, NCNN, and Openvino supported.

bio_inspired_min_nets_improve_the_performance_and_robustness_of_deep_networks

Code for ICCV 2021 paper Graph-to-3D: End-to-End Generation and Manipulation of 3D Scenes using Scene Graphs

Source code for From Stars to Subgraphs

RoadMap and preparation material for Machine Learning and Data Science - From beginner to expert.

[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

Reviving Iterative Training with Mask Guidance for Interactive Segmentation

September-Assistant - Open-source Windows Voice Assistant

Array Camera Ptychography

Voice Gender Recognition

SSD-based Object Detection in PyTorch

A Pytorch Implementation for Compact Bilinear Pooling.

Implementation of the Remixer Block from the Remixer paper, in Pytorch

Learning Dense Representations of Phrases at Scale (Lee et al., 2020)

Simple API for UCI Machine Learning Dataset Repository (search, download, analyze)

Official PyTorch Implementation for "Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes"