Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Overview

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop)

(Pronounced as "strog")

Paper

Arxiv

Why it matters?

Scene Text Recognition (STR) requires data augmentation functions that are different from object recognition. STRAug is data augmentation designed for STR. It offers 36 data augmentation functions that are sorted into 8 groups. Each function supports 3 levels or magnitudes of severity or intensity.

Given a source image:

it can be transformed as follows:

  1. warp.py - to generate Curve, Distort, Stretch (or Elastic) deformations
Curve Distort Stretch
  1. geometry.py - to generate Perspective, Rotation, Shrink deformations
Perspective Rotation Shrink
  1. pattern.py - to create different grids: Grid, VGrid, HGrid, RectGrid, EllipseGrid
Grid VGrid HGrid RectGrid EllipseGrid
  1. blur.py - to generate synthetic blur: GaussianBlur, DefocusBlur, MotionBlur, GlassBlur, ZoomBlur
GaussianBlur DefocusBlur MotionBlur GlassBlur ZoomBlur
  1. noise.py - to add noise: GaussianNoise, ShotNoise, ImpulseNoise, SpeckleNoise
GaussianNoise ShotNoise ImpulseNoise SpeckleNoise
  1. weather.py - to simulate certain weather conditions: Fog, Snow, Frost, Rain, Shadow
Fog Snow Frost Rain Shadow
  1. camera.py - to simulate camera sensor tuning and image compression/resizing: Contrast, Brightness, JpegCompression, Pixelate
Contrast Brightness JpegCompression Pixelate
  1. process.py - all other image processing issues: Posterize, Solarize, Invert, Equalize, AutoContrast, Sharpness, Color
Posterize Solarize Invert Equalize
AutoContrast Sharpness Color

Pip install

pip3 install straug

How to use

Command line (e.g. input image is nokia.png):

>>> from straug.warp import Curve
>>> from PIL import Image
>>> img = Image.open("nokia.png")
>>> img = Curve()(img, mag=3)
>>> img.save("curved_nokia.png")

Python script (see test.py):

python3 test.py --image=<target image>

For example:

python3 test.py --image=images/telekom.png

The corrupted images are in results directory.

Reference

  • Image corruptions (eg blur, noise, camera effects, fog, frost, etc) are based on the work of Hendrycks et al.

Citation

If you find this work useful, please cite:

@inproceedings{atienza2021data,
  title={Data Augmentation for Scene Text Recognition},
  author={Atienza, Rowel},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)},
  year={2021},
  pubstate={published},
  tppubtype={inproceedings}
}
Owner
Rowel Atienza
Rowel Atienza
Unofficial implementation (replicates paper results!) of MINER: Multiscale Implicit Neural Representations in pytorch-lightning

MINER_pl Unofficial implementation of MINER: Multiscale Implicit Neural Representations in pytorch-lightning. 📖 Ref readings Laplacian pyramid explan

AI葵 51 Nov 28, 2022
Robustness via Cross-Domain Ensembles

Robustness via Cross-Domain Ensembles [ICCV 2021, Oral] This repository contains tools for training and evaluating: Pretrained models Demo code Traini

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 27 Dec 23, 2022
Author Disambiguation using Knowledge Graph Embeddings with Literals

Author Name Disambiguation with Knowledge Graph Embeddings using Literals This is the repository for the master thesis project on Knowledge Graph Embe

12 Oct 19, 2022
DAT4 - General Assembly's Data Science course in Washington, DC

DAT4 Course Repository Course materials for General Assembly's Data Science course in Washington, DC (12/15/14 - 3/16/15). Instructors: Sinan Ozdemir

Kevin Markham 779 Dec 25, 2022
YOLOv7 - Framework Beyond Detection

🔥🔥🔥🔥 YOLO with Transformers and Instance Segmentation, with TensorRT acceleration! 🔥🔥🔥

JinTian 3k Jan 01, 2023
This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

This repository contains an overview of important follow-up works based on the original Vision Transformer (ViT) by Google.

75 Dec 02, 2022
OpenVINO黑客松比赛项目

Window_Guard OpenVINO黑客松比赛项目 英文名称:Window_Guard 中文名称:窗口卫士 硬件 树莓派4B 8G版本 一个磁石开关 USB摄像头(MP4视频文件也可以) 软件(库) OpenVINO RPi 使用方法 本项目使用的OPenVINO是是2021.3版本,并使用了

Tango 6 Jul 04, 2021
Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research

Megaverse Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research. The efficient design of the engine enables ph

Aleksei Petrenko 191 Dec 23, 2022
Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

transformer-from-scratch Code for my Medium blog post: Transformers from Scratch in PyTorch Note: This Transformer code does not include masked attent

Frank Odom 27 Dec 21, 2022
In this project, we create and implement a deep learning library from scratch.

ARA In this project, we create and implement a deep learning library from scratch. Table of Contents Deep Leaning Library Table of Contents About The

22 Aug 23, 2022
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

peng 64 Dec 12, 2022
Net2net - Network-to-Network Translation with Conditional Invertible Neural Networks

Net2Net Code accompanying the NeurIPS 2020 oral paper Network-to-Network Translation with Conditional Invertible Neural Networks Robin Rombach*, Patri

CompVis Heidelberg 206 Dec 20, 2022
Repo for parser tensorflow(.pb) and tflite(.tflite)

tfmodel_parser .pb file is the format of tensorflow model .tflite file is the format of tflite model, which usually used in mobile devices before star

1 Dec 23, 2021
Remote sensing change detection tool based on PaddlePaddle

PdRSCD PdRSCD(PaddlePaddle Remote Sensing Change Detection)是一个基于飞桨PaddlePaddle的遥感变化检测的项目,pypi包名为ppcd。目前0.2版本,最新支持图像列表输入的训练和预测,如多期影像、多源影像甚至多期多源影像。可以快速完

38 Aug 31, 2022
FB-tCNN for SSVEP Recognition

FB-tCNN for SSVEP Recognition Here are the codes of the tCNN and FB-tCNN in the paper "Filter Bank Convolutional Neural Network for Short Time-Window

Wenlong Ding 12 Dec 14, 2022
[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

VITA 71 Dec 28, 2022
Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

Tim Ansell 70 Nov 04, 2022
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

ONNX Runtime is a cross-platform inference and training machine-learning accelerator. ONNX Runtime inference can enable faster customer experiences an

Microsoft 8k Jan 04, 2023
prior-based-losses-for-medical-image-segmentation

Repository for papers: Benchmark: Effect of Prior-based Losses on Segmentation Performance: A Benchmark Midl: A Surprisingly Effective Perimeter-based

Rosana EL JURDI 9 Sep 07, 2022
Distributed Evolutionary Algorithms in Python

DEAP DEAP is a novel evolutionary computation framework for rapid prototyping and testing of ideas. It seeks to make algorithms explicit and data stru

Distributed Evolutionary Algorithms in Python 4.9k Jan 05, 2023