Utility for Text Normalisation or Inverse Normalisation

Overview

Text Processor

Text Normalisation or Inverse Normalisation for Indonesian, e.g.

measurements

  • "123 kg" -> "seratus dua puluh tiga kilogram"

Currency/Money

  • "Harga notebook itu $250 atau sekitar Rp 3,000,000" -> "Harga notebook itu dua ratus lima puluh dollar atau sekitar tiga juta rupiah"

Dates

  • "Pecurian itu terjadi hari minggu kemarin (12/3/2021)" -> "Pecurian itu terjadi hari minggu kemarin dua belas Maret dua ribu dua puluh satu "

Time Zone

  • "Kebakaran terjadi di wilayah jakarta sejak pukul 22.30 WITA di 1.000 rumah" -> "Kebakaran terjadi di wilayah jakarta sejak pukul dua puluh dua lewat tiga puluh menit Waktu Indonesia Tengah di seribu rumah"

Usage

from text_processor import TextProcessor

tp = TextProcessor()
print(tp.normalize("123 kg"))
Owner
Cahya Wirawan
System engineer, currently working on NLP, CV and Speech Recognition for fun and curiosity
Cahya Wirawan
Amazing GitHub Template - Sane defaults for your next project!

🚀 Useful README.md, LICENSE, CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md, GitHub Issues and Pull Requests and Actions templates to jumpstart your projects.

276 Jan 01, 2023
The bot creates hashtags for user's texts in Russian and English.

telegram_bot_hashtags The bot creates hashtags for user's texts in Russian and English. It is a simple bot for creating hashtags. NOTE file config.py

Yana Davydovich 2 Feb 12, 2022
Python port of Google's libphonenumber

phonenumbers Python Library This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase,

David Drysdale 3.1k Dec 29, 2022
This is an AI that is supposed to say you if your text is formal or not

This is an AI that is supposed to say you if your text is formal or not. It's written in Python 3 and has some german examples (because I'm german yk) in the text.json file. This file contains the te

1 Jan 12, 2022
utoken is a multilingual tokenizer that divides text into words, punctuation and special tokens such as numbers, URLs, XML tags, email-addresses and hashtags.

utoken utoken is a multilingual tokenizer that divides text into words, punctuation and special tokens such as numbers, URLs, XML tags, email-addresse

Ulf Hermjakob 11 Jan 05, 2023
pydantic-i18n is an extension to support an i18n for the pydantic error messages.

pydantic-i18n is an extension to support an i18n for the pydantic error messages

Boardpack 48 Dec 21, 2022
split Word file by chapter

split Word file by chapter we use the mircosoft word api to code this tool api url:https://docs.microsoft.com/zh-cn/dotnet/api/ if this tool is good f

wisdom under lemon trees 5 Nov 06, 2021
"Complexity" of Flags of the countries of the world

"Complexity" of Flags of the countries of the world Flags (png) from: https://flagcdn.com/w2560.zip https://flagpedia.net/download/images run: chmod +

Alexander Lelchuk 1 Feb 10, 2022
Goblin-sim - Procedural fantasy world generator

goblin-sim This project is an attempt to create a procedural goblin fantasy worl

3 May 18, 2022
Format Covid values to ASCII-Table (Only for Germany and Austria)

Covid-19-Formatter (Only for Germany and Austria) Dieses Script speichert die gemeldeten Daten des RKIs / BMSGPK und formatiert diese zu einer Asci Ta

56 Jan 22, 2022
Search for terms(word / table / field name or any) under Snowflake schema names

snowflake-search-terms-in-ddl-views Search for terms(word / table / field name or any) under Snowflake schema names Version : 1.0v How to use ? Run th

Igal Emona 1 Dec 15, 2021
Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)Wikipedia Extractive Text Summarizer + Keywords Identification (entropy-based)

Kevin Lai 1 Nov 08, 2021
RSS Reader application for the Emacs Application Framework.

EAF RSS Reader RSS Reader application for the Emacs Application Framework. Load application (add-to-list 'load-path "~/.emacs.d/site-lisp/eaf-rss-read

EAF 15 Dec 07, 2022
Adventura is an open source Python Text Adventure Engine

Adventura Adventura is an open source Python Text Adventure Engine, Not yet uplo

5 Oct 02, 2022
An extension to detect if the articles content match its title.

Clickbait Detector An extension to detect if the articles content match its title. This was developed in a period of 24-hours in a hackathon called 'H

Arvind Krishna 5 Jul 26, 2022
py-trans is a Free Python library for translate text into different languages.

Free Python library to translate text into different languages.

I'm Not A Bot #Left_TG 13 Aug 27, 2022
BaseCrack is a tool written in Python that can decode all alphanumeric base encoding schemes.

BaseCrack Decoder For Base Encoding Schemes BaseCrack is a tool written in Python that can decode all alphanumeric base encoding schemes. This tool ca

Mufeed VH 383 Dec 27, 2022
Migrates translations to the REDCap native Multi-Language Management system

Automates much of the process of moving translations from the old Multilingual external module to the newer built-in Multi-Language Management (MLM) page.

UCI MIND 3 Sep 27, 2022
Split large XML files into smaller ones for easy upload

Split large XML files into smaller ones for easy upload. Works for WordPress Posts Import and other XML files.

Joseph Adediji 1 Jan 30, 2022
Little python script + dictionary to help solve Wordle puzzles

Wordle Solver Little python script + dictionary to help solve Wordle puzzles Usage Usage: ./wordlesolver.py [letters in word] [letters not in word] [p

Luke Stephens (hakluke) 4 Jul 24, 2022