🎐 a python library for doing approximate and phonetic matching of strings.

Last update: Dec 21, 2022

Overview

jellyfish

Jellyfish is a python library for doing approximate and phonetic matching of strings.

Written by James Turk <[email protected]> and Michael Stephens.

See https://github.com/jamesturk/jellyfish/graphs/contributors for contributors.

See http://jellyfish.readthedocs.io for documentation.

Source is available at http://github.com/jamesturk/jellyfish.

Jellyfish >= 0.7 only supports Python 3, if you need Python 2 please use 0.6.x.

Included Algorithms

String comparison:

Levenshtein Distance
Damerau-Levenshtein Distance
Jaro Distance
Jaro-Winkler Distance
Match Rating Approach Comparison
Hamming Distance

Phonetic encoding:

American Soundex
Metaphone
NYSIIS (New York State Identification and Intelligence System)
Match Rating Codex

Example Usage

>>> import jellyfish
>>> jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish')
2
>>> jellyfish.jaro_distance(u'jellyfish', u'smellyfish')
0.89629629629629637
>>> jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs')
1

>>> jellyfish.metaphone(u'Jellyfish')
'JLFX'
>>> jellyfish.soundex(u'Jellyfish')
'J412'
>>> jellyfish.nysiis(u'Jellyfish')
'JALYF'
>>> jellyfish.match_rating_codex(u'Jellyfish')
'JLLFSH'

Running Tests

If you are interested in contributing to Jellyfish, you may want to run tests locally. Jellyfish uses tox to run tests, which you can setup and run as follows:

pip install tox
# cd jellyfish/
tox

🎐 a python library for doing approximate and phonetic matching of strings.

Related tags

Overview

jellyfish

Included Algorithms

Example Usage

Running Tests

Owner

James Turk

Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Pattern Matching in Python

Behavioral Testing of Clinical NLP Models

GSoC'2021 | TensorFlow implementation of Wav2Vec2

Transformers implementation for Fall 2021 Clinic

🏆 • 5050 most frequent words in 109 languages

Words-per-minute - A terminal app written in python utilizing the curses module that tests the user's ability to type

Sequence-to-Sequence Framework in PyTorch

End-to-end text to speech system using gruut and onnx. There are 40 voices available across 8 languages.

Python library for parsing resumes using natural language processing and machine learning

VD-BERT: A Unified Vision and Dialog Transformer with BERT

Official codebase for Can Wikipedia Help Offline Reinforcement Learning?

Code for the paper "BERT Loses Patience: Fast and Robust Inference with Early Exit".

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

Random-Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary

A desktop GUI providing an audio interface for GPT3.