A Python library for generating new text from existing samples.

Last update: May 17, 2022

Related tags

Overview

ReMarkov is a Python library for generating text from existing samples using Markov chains. You can use it to customize all sorts of writing from birthday messages, horoscopes, Wikipedia articles, or the utterances of your game's NPCs. Everything works without an omnipotent "AI" - it is dead-simple code and therefore fast.

Check out the examples and feel free to contribute!

Installation

pip3 install remarkov

Example

Scrape the Wikipedia page for "Computer Programming" and generate a new text from it:

./tools/scrape-wiki.py Computer_programming | remarkov build | remarkov generate

You can also use remarkov programmatically:

from remarkov import create_model

model = create_model()
model.add_text("This is a sample text and this is another.")

print(model.generate().text())
# "This is a sample text and this is a sample text and this is a sample text ..."

Development

Make sure you run pytest as module. This will add the current directory to the import path:

python3 -m pytest

This project uses black for source code formatting:

black .

Generate documentation for the project (this uses the original pdoc at pdoc.dev):

git checkout gh-pages
pdoc -t pdoc/template -o public/docs <path_to_remarkov_module>

Run type checks using mypy:

mypy -p remarkov

Publishing is done like this (don't forget to bump the version in setup.py):

pip3 install twine # optional

git tag -a <version>
git push --tags

python3 setup.py clean --all
python3 setup.py sdist bdist_wheel
twine check "dist/*"
twine upload "dist/*"

You might also like...

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples This repository is the official implementation of paper [Qimera: Data-free Q

21 Nov 3, 2022

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

112 Dec 13, 2022

Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Hepatitis C Blood Based Detection Final project for machine learning (CSC 590). Dataset from Kaggle. Using data from previous hepatitis C blood panels

1 Dec 28, 2021

Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Analysis of SARS-CoV-2 reads in sequencing of 2018-2019 Antarctica samples in PRJNA692319 The samples analyzed here are described in this preprint, wh

4 Feb 9, 2022

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

19 Sep 29, 2022

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

61 Nov 14, 2022

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023

A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network The official code of VisionLAN (ICCV2021). VisionLAN successfully a

81 Dec 12, 2022

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

About This repository shows how Autonomous Learning Library can be used to build new reinforcement learning agents. In particular, it contains a model

5 Aug 30, 2022

Comments

Release schedule
[x] Add source code documentation

[x] Improve explanation on website

[x] Adapt syntax highlighting in docs

[x] Generate samples for showcase

[x] Articles

[x] Birthday

[x] Horoscope

[x] Utterance

[x] Enable gh-pages
opened by lausek 0

Releases(v0.2.3)

v0.2.3(Jan 15, 2022)
ReMarkov Example Datasets - EN

Based on:

https://github.com/kavgan/OpinRank (Cars, Hotels)

https://github.com/dsnam/markovscope (Horoscopes)

https://github.com/hmi-utwente/video-game-text-corpora (NPC)

ReMarkov Wikipedia Scraper (Blockchain)

Source code(tar.gz)
Source code(zip)
remarkov-dataset.7z(6.16 MB)
remarkov-dataset.zip(9.05 MB)

A Python library for generating new text from existing samples.

Related tags

Overview

Installation

Example

Development

You might also like...

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

A PyTorch implementation of "From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network" (ICCV2021)

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

Comments

Release schedule

Releases(v0.2.3)

v0.2.3(Jan 15, 2022)

ReMarkov Example Datasets - EN

Owner

Code accompanying the paper on "An Empirical Investigation of Domain Generalization with Empirical Risk Minimizers" published at NeurIPS, 2021

Code for the paper: "On the Bottleneck of Graph Neural Networks and Its Practical Implications"

imbalanced-DL: Deep Imbalanced Learning in Python

Reducing Information Bottleneck for Weakly Supervised Semantic Segmentation (NeurIPS 2021)

Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators

Tooling for GANs in TensorFlow

A repository that finds a person who looks like you by using face recognition technology.

CCPD: a diverse and well-annotated dataset for license plate detection and recognition

rliable is an open-source Python library for reliable evaluation, even with a handful of runs, on reinforcement learning and machine learnings benchmarks.

Few-shot NLP benchmark for unified, rigorous eval

Symbolic Music Generation with Diffusion Models

Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

NLG evaluation via Statistical Measures of Similarity: BaryScore, DepthScore, InfoLM

Learning Neural Network Subspaces

3D Avatar Lip Syncronization from speech (JALI based face-rigging)

CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.

Learning Generative Models of Textured 3D Meshes from Real-World Images, ICCV 2021

Linear Variational State Space Filters

Computer Vision Script to recognize first person motion, developed as final project for the course "Machine Learning and Deep Learning"

The "breathing k-means" algorithm with datasets and example notebooks