Semantic similarity computation with different state-of-the-art metrics

Last update: Jun 22, 2022

Related tags

Overview

Semantic similarity computation with different state-of-the-art metrics

Description • Installation • Usage • License

Description

TaxoSS is a semantic similarity library for Python which implements the state-of-the-art semantic similarity metrics like Resnik, JCN, and HSS.

Requirements

Python 3.6 or later
NLTK
NumPy
Pandas

Installation

TaxoSS can be installed through pip (the Python package manager) in the following way:

pip install taxoss

Usage

Semantic similarity functions

You can compute the semantic similarity in the following way:

from TaxoSS.functions import semantic_similarity
semantic_similarity('brother', 'sister', 'hss')

3.353513521371089

The function semantic_similarity(word1, word2, kind, ic) has these options for the argument kind:

hss -> HSS (default)
wup -> WUP
lcs -> LC
path_sim -> Shortest Path
resnik -> Resnik
jcn -> Jiang-Conrath
lin -> Lin
seco -> Seco

For the argument ic see the following section.

Information Content

Using a Wikipedia copus for calculating the Information Content (default of the argument ic):

from TaxoSS.functions import semantic_similarity
semantic_similarity('cat', 'dog', 'resnik')

6.169410755220327

Calculating Information Conent from a given corpus:

from TaxoSS.calculate_IC import calculate_IC
from TaxoSS.functions import semantic_similarity

calculate_IC(path_to_corpus, path_to_save_IC_file)
semantic_similarity('cat', 'dog', 'resnik', path_to_save_IC_file)

with path_to_save_IC_file a path into the virtual environment TaxoSS package, e.g. venv/lib/python3.6/site-packages/TaxoSS/data/prova_IC.csv.

Benchmark

	HSS (ours)	HSS (ours)	WUP	WUP	LC	LC	Shortest Path	Shortest Path	Resnik	Resnik	Jiang-Conrath	Jiang-Conrath	Lin	Lin	Seco	Seco
	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman	Pearson	Spearman
MEN	0.41	0.33	0.36	0.33	0.14	0.05	0.07	0.03	0.05	0.03	-0.05	-0.04	0.05	0.04	-0.01	0.03
MC30	0.74	0.69	0.74	0.73	0.33	0.21	0.22	0.3	0.13	0.03	-0.06	-0.01	0.05	0.01	0.13	-0.09
WSS	0.68	0.65	0.58	0.59	0.36	0.23	0.16	0.1	0.02	-0.03	0.04	0.06	0.03	0.06	-0.01	-0.04
Simlex999	0.4	0.38	0.45	0.43	0.26	0.15	0.2	0.16	-0.04	-0.04	0.12	0.14	0.12	0.14	-0.02	-0.08
MT287	0.46	0.31	0.4	0.28	0.26	0.12	0.11	0.11	0.03	0.04	0.18	0.16	0.22	0.17	0	-0.06
MT771	0.44	0.4	0.43	0.49	0.06	0.02	0.1	0.13	0	-0.01	0	0	0	0	-0.05	-0.03
Time per pair (s)	0.0007	0.0007	0.008	0.008	0.0055	0.0055	0.0064	0.0064	0.5586	0.5586	0.551	0.551	0.5866	0.5866	0.0013	0.0013

Semantic similarity computation with different state-of-the-art metrics

Related tags

Overview

Semantic similarity computation with different state-of-the-art metrics

Description

Requirements

Installation

Usage

Semantic similarity functions

Information Content

Benchmark

Owner

Public Implementation of ChIRo from "Learning 3D Representations of Molecular Chirality with Invariance to Bond Rotations"

This is just a funny project that we want to see AutoEncoder (AE) can actually work to enhance the features we want

Full Resolution Residual Networks for Semantic Image Segmentation

Pytorch implementation of the paper Time-series Generative Adversarial Networks

(AAAI2022) Style Mixing and Patchwise Prototypical Matching for One-Shot Unsupervised Domain Adaptive Semantic Segmentation

LAnguage Model Analysis

Neuralnetwork - Basic Multilayer Perceptron Neural Network for deep learning

PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations

PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

Numenta published papers code and data

PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

[NeurIPS 2021] Garment4D: Garment Reconstruction from Point Cloud Sequences

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Repo for Photon-Starved Scene Inference using Single Photon Cameras, ICCV 2021

Multi-atlas segmentation (MAS) is a promising framework for medical image segmentation

tf2onnx - Convert TensorFlow, Keras and Tflite models to ONNX.

Running Google MoveNet Multipose Tracking models on OpenVINO.

PyTorch implementation for NED. It can be used to manipulate the facial emotions of actors in videos based on emotion labels or reference styles.

IA for recognising Traffic Signs using Keras [Tensorflow]

Second-Order Neural ODE Optimizer, NeurIPS 2021 spotlight