Complete portable pipeline for masking of Aadhaar Number adhering to Govt. Privacy Guidelines.

Overview

Aadhaar Number Masking Pipeline

Implementation of a complete pipeline that masks the Aadhaar Number in given images to adhere to Govt. of India's Privacy Guidelines for storage of Aadhaar Card images in digital repository. The following project was carried out as an internship for Muthoot Finance. We make use of the open source packages CRAFT text detector | Paper | Pretrained Model | Github Repo provided by Clova AI Research for OSD and combine a heurestic model with pytesseract OCR for masking.

Rohit Ranjan, Ram Sundaram.

Sample Results

teaser

Versions

The search for the best masking pipeline led us to experiment with several different approaches. We have documented our experiments in other branches.

Branch(->model) Speed/Performance Pipeline
main Best performing CRAFT + pytesseract + dimensional heuristics
CNN_OCR->cnn_model Fastest masking CRAFT + LeNet trained by us
CNN_OCR->cnn_model_2 Fastest masking CRAFT + LeNet trained by us
UNET_OCDR Theoretically Fastest but trained model unavailable** UNet

**We proposed and implemented a pipeline which uses a single UNet model for achieving a desirable mask. A single model would have made the inference very fast and real time use capable on mobile devices. Training meant creating a dataset since the company could not legally provide us the needed data. After several trials, we halted work on this model because with barely 150 unique datapoints available, a data hungry UNet Model is simply unsatiable for now.

Datasets

Lenet was trained on our self-created labelled dataset | labels.

Getting started

Install dependencies

Requirements

  • torch
  • opencv-python
  • tesseract-ocr
  • check requirements.txt
pip install -r requirements.txt

Test instructions

  • Clone this repository
git clone https://github.com/thefurorjuror/Aadhaar_Masker.git
  • Run on an image folder
python [folder path to the cloned repo]/masker.py --test_folder=[folder path to test images] --output_folder=[folder path to output images] --cuda=[True/False]
#Example- When one is inside the cloned repo
python masker.py --test_folder=./images/ --output_folder=./output/ --cuda=True

cuda is set to False by default.

Citation

@inproceedings{baek2019character,
  title={Character Region Awareness for Text Detection},
  author={Baek, Youngmin and Lee, Bado and Han, Dongyoon and Yun, Sangdoo and Lee, Hwalsuk},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={9365--9374},
  year={2019}
}

License

Copyright (c) 2021-present Rohit Ranjan & Ram Sundaram.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Spore REST API asyncio client

Spore REST API asyncio client

LEv145 16 Aug 02, 2022
A Python script that wraps the gitleaks tool to enable scanning of multiple repositories in parallel

mpgitleaks A Python script that wraps the gitleaks tool to enable scanning of multiple repositories in parallel. The motivation behind writing this sc

Emilio Reyes 7 Dec 29, 2022
Project for QVault Hackathon which plays sounds based on the letters of a user's name

virtual_instrument Project for QVault Hackathon which plays sounds based on the letters of a user's name I created a virtual instrument using Python a

Paolo Sidera 2 Feb 11, 2022
Find people to play tennis with.

40Love 40Love is a full-stack web application that helps tennis players find hits at public tennis courts. Players can select public courts on the map

Tanner Schmutte 27 Jun 08, 2022
A Next-Gen modular Python3 Telegram-Bot with Anime Theme to it.

Hsea Robot A modular Telegram Python bot running on python3 with a sqlalchemy database and an entirely themed persona to make Cutiepii suitable for An

Wahyusaputra 1 Dec 29, 2021
Send SMS text messages via email with as many accounts as you want :)

SMS-Spammer Send SMS text messages via email with as many accounts as you want :) Example Set Up Guide! To start log into the gmail account you would

Riceblades11 10 Oct 25, 2022
A simple python discord bot with commands for moderation and utility.

Discord Bot A simple python discord bot with commands for moderation, utility and fun. Moderation $kick user reason - Kick a user from the server

syn 3 Feb 06, 2022
A python Discord wrapper made in well, python.

discord.why A python Discord wrapper made in well, python. Made to be used by devs who want something a bit more, general. Basic Examples Sending a me

HellSec 6 Mar 26, 2022
Fetch tracking numbers of Amazon orders, for the ease of the logistics.

Amazon-Tracking-Number Fetch tracking numbers of Amazon orders, for the ease of the logistics. Read Me First (How to use this code): Get Amazon "Items

Tony Yao 1 Nov 02, 2021
ByDiego Token Grabber is a Discord Stealer

ByDiego Token Grabber is a Discord Stealer. This way you can get too much information from x person if you pass it on and open it

zByDiegoM.T 4 Mar 11, 2022
✖️ Unofficial API of 1337x.to

✖️ Unofficial Python API Wrapper of 1337x This is the unofficial API of 1337x. It supports all proxies of 1337x and almost all functions of 1337x. You

Hemanta Pokharel 71 Dec 26, 2022
With this program you can work English & Turkish

1 - How Can I Work This? You must have Python compilers in order to run this program. First of all, download the compiler in the link. Compiler 2 - Do

Mustafa Bahadır Doğrusöz 3 Aug 07, 2021
Bot that embeds a random hysterical meme from Reddit into your text channel as an embedded message, using an API call.

Discord_Meme_Bot 🤣 Bot that embeds a random hysterical meme from Reddit into your text channel as an embedded message, using an API call. Add the bot

2 Jan 16, 2022
Flask-SQLAlchemy API for daisuki-web

💟 Anime Daisuki! API API de animes com cadastro de usuários. O usuário autenticado pode avaliar e favoritar animes, comentar episódios e verificar o

Paulo Thor 1 Nov 04, 2021
SIGIT - Simple Information Gathering Toolkit

SIGIT - Simple Information Gathering Toolkit Features userrecon - username reconnaissance facedumper - dump facebook information mailfinder - find ema

Termux Hackers 437 Dec 29, 2022
A Tᴇʟᴇɢʀᴀᴍ Vɪᴅᴇᴏ Pʟᴀʏᴇʀ Bᴏᴛ Tᴏ Pʟᴀʏ YT Vɪᴅᴇᴏs & Lɪᴠᴇ Sᴛʀᴇᴀᴍ.

Tuktuky_Music Telegram bot to stream videos in telegram voicechat for both groups and channels. Supports live strams, YouTube videos and telegram medi

TᑌKTᑌKY ᖇᗩᕼᗰᗩᑎ 3 Sep 14, 2021
Moon-TikTok-Checker - A TikTok Username checking tool that probably 3/4 people use to get rare usernames

Moon Checker (educational Purposes Only) What Is Moon Checker? This is a TikTok

glide 4 Nov 30, 2022
Utility for downloading fanfiction in bulk from the Archive of Our Own

What is this? This is a program intended to help you download fanfiction from the Archive of Our Own in bulk. This program is primarily intended to wo

73 Dec 30, 2022
Tinyman Python SDK

tinyman-py-sdk Tinyman Python SDK Design Goal This SDK is designed for automated interaction with the Tinyman AMM. It will be most useful for develope

Tinyman 113 Dec 30, 2022
Georeferencing large amounts of data for free.

Geolocate Georeferencing large amounts of data for free. Special thanks to @brunodepauloalmeida and the whole team for the contributions. How? It's us

Gabriel Gazola Milan 23 Dec 30, 2022