Python binding for Morfologik

Morfologik is Polish morphological analyzer. For more information see http://github.com/morfologik/morfologik-stemming/ and http://http://www.morfologik.blogspot.com/

Requirements

This binding works with Python 2 and Python 3.

Installation

Install it from pip

pip install pyMorfologik

or directly from github

git clone https://github.com/dmirecki/pyMorfologik.git

Usage

Now, only simple stems are supported:

>>> from pymorfologik import Morfologik
>>> from pymorfologik.parsing import ListParser
>>>
>>> parser = ListParser()
>>> stemmer = Morfologik()
>>> stemmer.stem(['Ala ma kota'], parser)
[(u'Ala',
  {u'Al': [u'subst:sg:acc:m1+subst:sg:gen:m1'],
   u'Ala': [u'subst:sg:nom:f'],
   u'Alo': [u'subst:sg:acc:m1+subst:sg:gen:m1']}),
 (u'ma',
  {u'mieć': [u'verb:fin:sg:ter:imperf:refl.nonrefl'],
   u'mój': [u'adj:sg:nom.voc:f:pos']}),
 (u'kota', {u'kot': [u'subst:sg:acc:m1'], u'kota': [u'subst:sg:nom:f']})]

Acknowledgements

This repo is based on Morfologik, a great contribution of Marcin Miłowski (http://marcinmilkowski.pl) and Dawid Weiss (http://www.dawidweiss.com).

Contributions

Damian Mirecki

Adrian Bohdanowicz

pyMorfologik MorfologikpyMorfologik - Python binding for Morfologik.

Related tags

Overview

Python binding for Morfologik

Requirements

Installation

Usage

Acknowledgements

Contributions

Owner

Damian Mirecki

EdiTTS: Score-based Editing for Controllable Text-to-Speech

This repository contains the code for running the character-level Sandwich Transformers from our ACL 2020 paper on Improving Transformer Models by Reordering their Sublayers.

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Stand-alone language identification system

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

An easier way to build neural search on the cloud

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

This repository contains Python scripts for extracting linguistic features from Filipino texts.

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정한 코드입니다.

Non-Autoregressive Predictive Coding

ThinkTwice: A Two-Stage Method for Long-Text Machine Reading Comprehension

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)

Some embedding layer implementation using ivy library

Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

Reproduction process of BERT on SST2 dataset

🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

Bpe algorithm can finetune tokenizer - Bpe algorithm can finetune tokenizer