🔤 Measure edit distance based on keyboard layout

Related tags

Miscellaneousclavier
Overview

clavier

Measure edit distance based on keyboard layout.



Table of contents

Introduction

Default edit distances, such as the Levenshtein distance, don't differentiate between characters. The distance between two characters is either 0 or 1. This package allows you to measure edit distances by taking into account keyboard layouts.

The scope is purposefully limited to alphabetical, numeric, and punctuation keys. That's because this package is meant to assist in analyzing user inputs -- e.g. for spelling correction in a search engine.

The goal of this package is to be flexible. You can define any logical layout, such as QWERTY or AZERTY. You can also control the physical layout by defining where the keys are on the board.

Installation

pip install git+https://github.com/MaxHalford/clavier

User guide

Keyboard layouts

☝️ Things are a bit more complicated than QWERTY vs. AZERTY vs. XXXXXX. Each layout has many variants. I haven't yet figured out a comprehensive way to map all these out.

This package provides a list of keyboard layouts. For instance, we'll load the QWERTY keyboard layout.

>>> import clavier
>>> keyboard = clavier.load_qwerty()
>>> keyboard
1 2 3 4 5 6 7 8 9 0 - =
q w e r t y u i o p [ ] \
a s d f g h j k l ; '
z x c v b n m , . /

>>> keyboard.shape
(4, 13)

>>> len(keyboard)
46

Here is the list of currently available layouts:

>>> for layout in (member for member in dir(clavier) if member.startswith('load_')):
...     print(layout.replace('load_', ''))
...     exec(f'print(clavier.{layout}())')
...     print('---')
dvorak
` 1 2 3 4 5 6 7 8 9 0 [ ]
' , . p y f g c r l / = \
a o e u i d h t n s -
; q j k x b m w v z
---
qwerty
1 2 3 4 5 6 7 8 9 0 - =
q w e r t y u i o p [ ] \
a s d f g h j k l ; '
z x c v b n m , . /
---

Distance between characters

Measure the Euclidean distance between two characters on the keyboard.

>>> keyboard.char_distance('1', '2')
1.0

>>> keyboard.char_distance('q', '2')
1.4142135623730951

>>> keyboard.char_distance('1', 'm')
6.708203932499369

Distance between words

Measure a modified version of the Levenshtein distance, where the substitution cost is the output of the char_distance method.

>>> keyboard.word_distance('apple', 'wople')
2.414213562373095

>>> keyboard.word_distance('apple', 'woplee')
3.414213562373095

You can also override the deletion cost by specifying the deletion_cost parameter, and the insertion cost via the insertion_cost parameter. Both default to 1.

Typing distance

Measure the sum of distances between each pair of consecutive characters. This can be useful for studying keystroke dynamics.

>>> keyboard.typing_distance('hello')
10.245040190466598

For sentences, you can split them up into words and sum the typing distances.

>>> sentence = 'the quick brown fox jumps over the lazy dog'
>>> sum(keyboard.typing_distance(word) for word in sentence.split(' '))
105.60457487263012

Interestingly, this can be used to compare keyboard layouts in terms of efficiency. For instance, the Dvorak keyboard layout is supposedly more efficient than the QWERTY layout. Let's compare both on the first stanza of If— by Rudyard Kipling:

>> words = list(map(str.lower, stanza.split())) >>> qwerty = clavier.load_qwerty() >>> sum(qwerty.typing_distance(word) for word in words) 740.3255229138255 >>> dvorak = clavier.load_dvorak() >>> sum(dvorak.typing_distance(word) for word in words) 923.6597116104518 ">
>>> stanza = """
... If you can keep your head when all about you
...    Are losing theirs and blaming it on you;
... If you can trust yourself when all men doubt you,
...    But make allowance for their doubting too;
... If you can wait and not be tired by waiting,
...    Or, being lied about, don't deal in lies,
... Or, being hated, don't give way to hating,
...    And yet don't look too good, nor talk too wise;
... """

>>> words = list(map(str.lower, stanza.split()))

>>> qwerty = clavier.load_qwerty()
>>> sum(qwerty.typing_distance(word) for word in words)
740.3255229138255

>>> dvorak = clavier.load_dvorak()
>>> sum(dvorak.typing_distance(word) for word in words)
923.6597116104518

It seems the Dvorak layout is in fact slower than the QWERTY layout. But of course this might not be the case in general.

Nearest neighbors

You can iterate over the k nearest neighbors of any character.

>>> qwerty = clavier.load_qwerty()
>>> for char, dist in qwerty.nearest_neighbors('s', k=8, cache=True):
...     print(char, f'{dist:.4f}')
w 1.0000
a 1.0000
d 1.0000
x 1.0000
q 1.4142
e 1.4142
z 1.4142
c 1.4142

The cache parameter determines whether or not the result should be cached for the next call.

Physical layout specification

By default, the keyboard layouts are ortholinear, meaning that the characters are physically arranged over a grid. You can customize the physical layout to make it more realistic and thus obtain distance measures which are closer to reality. This can be done by specifying parameters to the keyboards when they're loaded.

Staggering

Staggering is the amount of offset between two consecutive keyboard rows.

You can specify a constant staggering as so:

>>> keyboard = clavier.load_qwerty(staggering=0.5)

By default the keys are spaced by 1 unit. So a staggering value of 0.5 implies a 50% horizontal shift between each pair of consecutive rows. You may also specify a different amount of staggering for each pair of rows:

>>> keyboard = clavier.load_qwerty(staggering=[0.5, 0.25, 0.5])

There's 3 elements in the list because the keyboard has 4 rows.

Key pitch

Key pitch is the amount of distance between the centers of two adjacent keys. Most computer keyboards have identical horizontal and vertical pitches, because the keys are all of the same size width and height. But this isn't the case for mobile phone keyboards. For instance, iPhone keyboards have a higher vertical pitch.

Drawing a keyboard layout

>>> keyboard = clavier.load_qwerty()
>>> ax = keyboard.draw()
>>> ax.get_figure().savefig('img/qwerty.png', bbox_inches='tight')

qwerty

>>> keyboard = clavier.load_qwerty(staggering=[0.5, 0.25, 0.5])
>>> ax = keyboard.draw()
>>> ax.get_figure().savefig('img/qwerty_staggered.png', bbox_inches='tight')

qwerty_staggered

Custom layouts

You can of course specify your own keyboard layout. There are different ways to do this. We'll use the iPhone keypad as an example.

The from_coordinates method

>>> keypad = clavier.Keyboard.from_coordinates({
...     '1': (0, 0), '2': (0, 1), '3': (0, 2),
...     '4': (1, 0), '5': (1, 1), '6': (1, 2),
...     '7': (2, 0), '8': (2, 1), '9': (2, 2),
...     '*': (3, 0), '0': (3, 1), '#': (3, 2),
...                  '☎': (4, 1)
... })
>>> keypad
1 2 3
4 5 6
7 8 9
* 0 #

The from_grid method

>> keypad 1 2 3 4 5 6 7 8 9 * 0 # ☎ ">
>>> keypad = clavier.Keyboard.from_grid("""
...     1 2 3
...     4 5 6
...     7 8 9
...     * 0 #
...       ☎
... """)
>>> keypad
1 2 3
4 5 6
7 8 9
* 0 #

Development

git clone https://github.com/MaxHalford/clavier
cd clavier
pip install poetry
poetry install
poetry shell
pytest

License

The MIT License (MIT). Please see the license file for more information.

Owner
Max Halford
Going where the wind blows 🍃 🦔
Max Halford
Automates the fixing of problems reported by yamllint by parsing its output

yamlfixer yamlfixer automates the fixing of problems reported by yamllint by parsing its output. Usage This software automatically fixes some errors a

OPT Nouvelle Caledonie 26 Dec 26, 2022
Ballcone is a fast and lightweight server-side Web analytics solution.

Ballcone Ballcone is a fast and lightweight server-side Web analytics solution. It requires no JavaScript on your website. Screenshots Design Goals Si

Dmitry Ustalov 49 Dec 11, 2022
Materials for the Introduction in Python , Linux , Git and Github

This repository contains all the materials of the presentation on the introduction of python, linux, git and Github.

AMMI 3 Aug 28, 2022
Grail(TM) is a web browser written in Python

Grail is distributed in source form. It requires that you have a Python interpreter and a Tcl/Tk installation, with the Python interpreter configured for Tcl/Tk support.

22 Oct 18, 2022
The code for 2021 MGTV AI Challenge Anti Stealing Link, and the online result ranks 10th.

赛题介绍 芒果TV-第二届“马栏山杯”国际音视频算法大赛-防盗链 随着业务的发展,芒果的视频内容也深受网友的喜欢,不少视频网站和应用开始盗播芒果的视频内容,盗链网站不经过芒果TV的前端系统,跳过广告播放,且消耗大量的服务器、带宽资源,直接给公司带来了巨大的经济损失,因此防盗链在日常运营中显得尤为重要

tongji40 16 Jun 17, 2022
A simple but flexible plugin system for Python.

PluginBase PluginBase is a module for Python that enables the development of flexible plugin systems in Python. Step 1: from pluginbase import PluginB

Armin Ronacher 1k Dec 16, 2022
A collection of simple tools that proved to be needed for hadling large periodic calculations with the VASP software package.

VESTA-tools A collection of simple tools that proved to be needed for handling large periodic calculations with the VASP software package. distTotCalc

Ilia Kichev 2 Dec 14, 2021
List of Linux Tools I put on almost every linux / Debian host

Linux-Tools List of Linux Tools I put on almost every Linux / Debian host Installed: geany -- GUI editor/ notepad++ like chkservice -- TUI Linux ser

Stew Alexander 20 Jan 02, 2023
🌌 Economics Observatory Visualisation Repository

Economics Observatory Visualisation Repository Website | Visualisations | Data | Here you will find all the data visualisations and infographics attac

Economics Observatory 3 Dec 14, 2022
IST-Website - IST Tutoring Portal for python

IST Tutoring Portal This portal is a web based interface to handle student help

Jean 3 Jan 03, 2022
Xbox-Flood is for flood anything

Intruduction Installation Usage Installing Python 3 Wiki Getting Started Creating a Key Intruduction Xbox-Flood is for flooding messages (invitations

kayake 4 Feb 18, 2022
Beginner Projects A couple of beginner projects here

Beginner Projects A couple of beginner projects here, listed from easiest to hardest :) selector.py: simply a random selector to tell me who to faceti

Kylie 272 Jan 07, 2023
Project5 Data processing system

Project5-Data-processing-system User just needed to copy both these file to a folder and open Project5.py using cmd or using any python ide. It is to

1 Nov 23, 2021
Pdraw - Generate Deterministic, Procedural Artwork from Arbitrary Text

pdraw.py: Generate Deterministic, Procedural Artwork from Arbitrary Text pdraw a

Brian Schrader 2 Sep 12, 2022
AdventOfCode 2021 solutions from the Devcord server

adventofcode-21 Ein Sammel-Repository für Advent of Code 2021-Lösungen der deutschen DevCord-Community. A repository collecting Advent of Code 2021 so

Devcord 12 Aug 26, 2022
Programmatic startup/shutdown of ASGI apps.

asgi-lifespan Programmatically send startup/shutdown lifespan events into ASGI applications. When used in combination with an ASGI-capable HTTP client

Florimond Manca 129 Dec 27, 2022
Just messing around with AI for fun coding 😂

Python-AI Projects 🤖 World Clock ⏰ ⚙︎ Steps to run world-clock.py file Download and open the file in your Python IDE. Run the file a type the name of

Danish Saleem 0 Feb 10, 2022
Osintgram by Datalux but i fixed some errors i found and made it look cleaner

OSINTgram-V2 OSINTgram-V2 is made from Osintgram which is made by Datalux originally but i took the script and fixed some errors i found and made the

2 Feb 02, 2022
Notifies server owners of mod updates, also notifies of player deaths and player joins through Discord.

ProjectZomboid-ServerAssistant Notifies server owners of mod updates, also notifies of player deaths and player joins through Discord. A Python based

3 Sep 30, 2022
Blender addon that enables exporting of xmodels from blender. Great for custom asset creation for cod games

Birdman's XModel Tools For Blender Greetings everyone in the custom cod community. This blender addon should finally enable exporting of custom assets

wast 2 Jul 02, 2022