Yaspeller Dictionary (Auto)builder

Usage

# this sample command generates `./yaspeller_report.json`
# yaspeller --report json --ignore-digits --ignore-text "'.*" --ignore-latin --only-errors --file-extensions ".md" --lang ru

python -m venv env
source env/bin/activate
pip install 
python src/dictionary.py yaspeller_report.json

Why

Yaspeller is nice, but there are too many anglicisms in a usual documentation. Normally you just want to ignore that, but there's the only possibility to add a regexp-array to ignore words.

This generates a array of dictionary words including all lexems for all cases like

[
    "[бБ]аг(а|ам|ами|ах|е|и|ов|ом|у)?",
    "[дД]ифф(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "[кК]оммит(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "[пП]атчинг(а|ам|ами|ах|е|и|ов|ом|у)?",
    "[рР]убист(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "[сС]амоорганизованн(ого|ом|ому|ую|ые|ый|ым|ыми|ых)",
    "[тТ]икет(а|ам|ами|ах|е|ов|ом|у|ы)?",
    "коммитить"
]

from yaspeller errors (in text format looking like)

Spelling check:
✗ www.ruby-lang.org/ru/community/ruby-core/index.md 130 ms
-----
Typos: 9
1. патчингом (36:27)
2. коммитить (68:32, suggest: комитет)
3. багах (75:15, suggest: богах, баках, бегах)
4. баги (89:24, suggest: багги)
5. баг (96:25)
6. тикет (107:14, suggest: этикет)
7. дифф (115:18)
8. коммиту (147:24, suggest: комету, комнату)
9. коммита (148:58, suggest: комета)
-----

Live example

Initially created for www.ruby-lang.org translations spellchecking

🤕 spelling exceptions builder for lazy people

Related tags

Overview

Yaspeller Dictionary (Auto)builder

Usage

Why

Live example

Owner

Vlad Bokov

Open-World Entity Segmentation

SGMC: Spectral Graph Matrix Completion

KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정한 코드입니다.

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Russian GPT3 models.

🗣️ NALP is a library that covers Natural Adversarial Language Processing.

This repository describes our reproducible framework for assessing self-supervised representation learning from speech

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

✔👉A Centralized WebApp to Ensure Road Safety by checking on with the activities of the driver and activating label generator using NLP.

A Python package implementing a new model for text classification with visualization tools for Explainable AI :octocat:

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

Natural Language Processing library built with AllenNLP 🌲🌱

PyWorld3 is a Python implementation of the World3 model

Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

Text vectorization tool to outperform TFIDF for classification tasks

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

天池中药说明书实体识别挑战冠军方案；中文命名实体识别；NER; BERT-CRF & BERT-SPAN & BERT-MRC；Pytorch

DziriBERT: a Pre-trained Language Model for the Algerian Dialect