STEFANN: Scene Text Editor using Font Adaptive Neural Network

Overview

Getting Started  •   Training Networks  •   External Links  •   Citation  •   License



The official GitHub repository for the paper on STEFANN: Scene Text Editor using Font Adaptive Neural Network.


Getting Started

1. Installing Dependencies

Package Source Version Tested version
(Updated on April 14, 2020)
Python Conda 3.7.7 ✔️
Pip Conda 20.0.2 ✔️
Numpy Conda 1.18.1 ✔️
Requests Conda 2.23.0 ✔️
TensorFlow Conda 2.1.0 ✔️
Keras Conda 2.3.1 ✔️
Pillow Conda 7.0.0 ✔️
Colorama Conda 0.4.3 ✔️
OpenCV PyPI 4.2.0 ✔️
PyQt5 PyPI 5.14.2 ✔️

💥 Quick installation

Step 1: Install Git and Conda package manager (Miniconda / Anaconda)

Step 2: Update and configure Conda

conda update conda
conda config --set env_prompt "({name}) "

Step 3: Clone this repository and change directory to repository root

git clone https://github.com/prasunroy/stefann.git
cd stefann

Step 4: Create an environment and install depenpencies

On Linux and Windows

  • To create CPU environment: conda env create -f release/env_cpu.yml
  • To create GPU environment: conda env create -f release/env_gpu.yml

On macOS

  • To create CPU environment: conda env create -f release/env_osx.yml

💥 Quick test

Step 1: Download models and pretrained checkpoints into release/models directory

Step 2: Download sample images and extract into release/sample_images directory

stefann/
├── ...
├── release/
│   ├── models/
│   │   ├── colornet.json
│   │   ├── colornet_weights.h5
│   │   ├── fannet.json
│   │   └── fannet_weights.h5
│   ├── sample_images/
│   │   ├── 01.jpg
│   │   ├── 02.jpg
│   │   └── ...
│   └── ...
└── ...

Step 3: Activate environment

To activate CPU environment: conda activate stefann-cpu
To activate GPU environment: conda activate stefann-gpu

Step 4: Change directory to release and run STEFANN

cd release
python stefann.py

2. Editing Results 😆


Each image pair consists of the original image (Left) and the edited image (Right).


Training Networks

1. Downloading Datasets

Download datasets and extract the archives into datasets directory under repository root.

stefann/
├── ...
├── datasets/
│   ├── fannet/
│   │   ├── pairs/
│   │   ├── train/
│   │   └── valid/
│   └── colornet/
│       ├── test/
│       ├── train/
│       └── valid/
└── ...

📌 Description of datasets/fannet

This dataset is used to train FANnet and it consists of 3 directories: fannet/pairs, fannet/train and fannet/valid. The directories fannet/train and fannet/valid consist of 1015 and 300 sub-directories respectively, each corresponding to one specific font. Each font directory contains 64x64 grayscale images of 62 English alphanumeric characters (10 numerals + 26 upper-case letters + 26 lower-case letters). The filename format is xx.jpg where xx is the ASCII value of the corresponding character (e.g. "48.jpg" implies an image of character "0"). The directory fannet/pairs contains 50 image pairs, each corresponding to a random font from fannet/valid. Each image pair is horizontally concatenated to a dimension of 128x64. The filename format is id_xx_yy.jpg where id is the image identifier, xx and yy are the ASCII values of source and target characters respectively (e.g. "00_65_66.jpg" implies a transformation from source character "A" to target character "B" for the image with identifier "00").

📌 Description of datasets/colornet

This dataset is used to train Colornet and it consists of 3 directories: colornet/test, colornet/train and colornet/valid. Each directory consists of 5 sub-directories: _color_filters, _mask_pairs, input_color, input_mask and output_color. The directory _color_filters contains synthetically generated color filters of dimension 64x64 including both solid and gradient colors. The directory _mask_pairs contains a set of 64x64 grayscale image pairs selected at random from 1315 available fonts in datasets/fannet. Each image pair is horizontally concatenated to a dimension of 128x64. For colornet/train and colornet/valid each color filter is applied on each mask pair. This results in 64x64 image triplets of color source image, binary target image and color target image in input_color, input_mask and output_color directories respectively. For colornet/test one color filter is applied only on one mask pair to generate similar image triplets. With a fixed set of 100 mask pairs, 80000 colornet/train and 20000 colornet/valid samples are generated from 800 and 200 color filters respectively. With another set of 50 mask pairs, 50 colornet/test samples are generated from 50 color filters.

2. Training FANnet and Colornet

Step 1: Activate environment

To activate CPU environment: conda activate stefann-cpu
To activate GPU environment: conda activate stefann-gpu

Step 2: Change directory to project root

cd stefann

Step 3: Configure and train FANnet

To configure training options edit configurations section (line 40-72) of fannet.py
To start training: python fannet.py

☁️ Check this notebook hosted at Kaggle for an interactive demonstration of FANnet.

Step 4: Configure and train Colornet

To configure training options edit configurations section (line 38-65) of colornet.py
To start training: python colornet.py

☁️ Check this notebook hosted at Kaggle for an interactive demonstration of Colornet.

External Links

Project  •   Paper  •   Supplementary Materials  •   Datasets  •   Models  •   Sample Images


Citation

@InProceedings{Roy_2020_CVPR,
  title     = {STEFANN: Scene Text Editor using Font Adaptive Neural Network},
  author    = {Roy, Prasun and Bhattacharya, Saumik and Ghosh, Subhankar and Pal, Umapada},
  booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2020}
}

License

Copyright 2020 by the authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Made with ❤️ and 🍕 on Earth.
Repository relating to the CVPR21 paper TimeLens: Event-based Video Frame Interpolation

TimeLens: Event-based Video Frame Interpolation This repository is about the High Speed Event and RGB (HS-ERGB) dataset, used in the 2021 CVPR paper T

Robotics and Perception Group 544 Dec 19, 2022
Morphological edge detection or object's boundary detection using erosion and dialation in OpenCV python

Morphologycal-edge-detection-using-erosion-and-dialation the task is to detect object boundary using erosion or dialation . Here, use the kernel or st

Tamzid hasan 3 Nov 25, 2022
learn how to use Gesture Control to change the volume of a computer

Volume-Control-using-gesture In this project we are going to learn how to use Gesture Control to change the volume of a computer. We first look into h

Diwas Pandey 49 Sep 22, 2022
Let's explore how we can extract text from forms

Form Segmentation Let's explore how we can extract text from any forms / scanned pages. Objectives The goal is to find an algorithm that can extract t

Philip Doxakis 42 Jun 05, 2022
Ocular is a state-of-the-art historical OCR system.

Ocular Ocular is a state-of-the-art historical OCR system. Its primary features are: Unsupervised learning of unknown fonts: requires only document im

228 Dec 30, 2022
Awesome anomaly detection in medical images

A curated list of awesome anomaly detection works in medical imaging, inspired by the other awesome-* initiatives.

Kang Zhou 57 Dec 19, 2022
Rubik's Cube in pygame with OpenGL

Rubik Rubik's Cube in pygame with OpenGL The script show on the screen a Rubik Cube buit with OpenGL. Then I have also implemented all the possible mo

Gabro 2 Apr 15, 2022
Fusion 360 Add-in that creates a pair of toothed curves that can be used to split a body and create two pieces that slide and lock together.

Fusion-360-Add-In-PuzzleSpline Fusion 360 Add-in that creates a pair of toothed curves that can be used to split a body and create two pieces that sli

Michiel van Wessem 1 Nov 15, 2021
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Bridging Video-text Retrieval with Multiple Choice Questions, CVPR 2022 (Oral) Paper | Project Page | Pre-trained Model | CLIP-Initialized Pre-trained

Applied Research Center (ARC), Tencent PCG 99 Jan 06, 2023
TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

FOTS: Fast Oriented Text Spotting with a Unified Network I am still working on this repo. updates and detailed instructions are coming soon! Table of

Masao Taketani 52 Nov 11, 2022
Hand Detection and Finger Detection on Live Feed

Hand-Detection-On-Live-Feed Hand Detection and Finger Detection on Live Feed Getting Started Install the dependencies $ git clone https://github.com/c

Chauhan Mahaveer 2 Jan 02, 2022
Smart computer vision application

Smart-computer-vision-application Backend : opencv and python Library required:

2 Jan 31, 2022
An advanced 2D image manipulation with features such as edge detection and image segmentation built using OpenCV

OpenCV-ToothPaint3-Advanced-Digital-Image-Editor This application named ‘Tooth Paint’ version TP_2020.3 (64-bit) or version 3 was developed within a w

JunHong 1 Nov 05, 2021
PianoVisuals - Create background videos synced with piano music using opencv

Steps Record piano video Use Neural Network to do body segmentation (video matti

Solbiati Alessandro 4 Jan 24, 2022
OCR powered screen-capture tool to capture information instead of images

NormCap OCR powered screen-capture tool to capture information instead of images. Links: Repo | PyPi | Releases | Changelog | FAQs Content: Quickstart

575 Dec 31, 2022
This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Script_Convertir_PDF_IMG_TXT Este script de pyhton convierte un pdf en Imagen luego utilizando tesseract como motor OCR convierte la Imagen a Texto. p

alebogado 1 Jan 27, 2022
Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

Scene Text Recognition Recommendations Everythin about Scene Text Recognition SOTA • Papers • Datasets • Code Contents 1. Papers 2. Datasets 2.1 Synth

Deep Learning and Vision Computing Lab, SCUT 197 Jan 05, 2023
Converts an image into funny, smaller amongus characters

SussyImage Converts an image into funny, smaller amongus characters Demo Mona Lisa | Lona Misa (Made up of AmongUs characters) API I've also added an

Dhravya Shah 14 Aug 18, 2022
Official PyTorch implementation for "Mixed supervision for surface-defect detection: from weakly to fully supervised learning"

Mixed supervision for surface-defect detection: from weakly to fully supervised learning [Computers in Industry 2021] Official PyTorch implementation

ViCoS Lab 169 Dec 30, 2022
A simple Security Camera created using Opencv in Python where images gets saved in realtime in your Dropbox account at every 5 seconds

Security Camera using Opencv & Dropbox This is a simple Security Camera created using Opencv in Python where images gets saved in realtime in your Dro

Arpit Rath 1 Jan 31, 2022