Simple integer-valued time series bit packing

Overview

Smahat Time Series Encoding

Smahat allows to encode a sequence of integer values using a fixed (for all values) number of bits but minimal with regards to the data range. For example: for a series of boolean values only one bit is needed, for a series of integer percentages 7 bits are needed, etc.

Smahat is useful when:

  • Time series is integer-valued. (It doesn't work with floats :))
  • The range of the data is known in advance (if not streaming, this is not necessary).
  • The data range is relatively small.
  • The data does not have properties that would make other compression algorithms useful, or these other algorithms have an unacceptable cost for the use case.

Smahat can also be used as a baseline to calculate the true compression ratio of a compression algorithm on data of a certain nature.

Installation

To install the latest release:

$ pip install smahat

You can also build a local package and install it:

$ make build
$ pip install dist/*.whl

Usage

Import smahat module.

>>> import smahat

Data to encode.

>>> values = [12, 0, 17, 15, 78, 10]

Encoding

You can use encode_next to encode one value by one:

>>> encoder = smahat.Encoder(min_value=0, max_value=100, strategy='saturate')
>>> for v in values:
...     encoder.encode_next(v)
>>> content = encoder.get_encoded()
>>> content
{'encoded': b'\x18\x00\x88\xf9\xc2\x80', 'shift': 0, 'n_bits_per_value': 7, 'n_padding_bits': 6}

Or you can use Encoder.encode_all to encode all values (range min and max will be inferred from values if not provided):

>>> content = smahat.Encoder.encode_all(values, min_value=0, max_value=100, strategy='saturate')
>>> content
{'encoded': b'\x18\x00\x88\xf9\xc2\x80', 'shift': 0, 'n_bits_per_value': 7, 'n_padding_bits': 6}

Decoding

To decode use Decoder.decode_all.

>>> smahat.Decoder.decode_all(content)
[12, 0, 17, 15, 78, 10]

Encoding result

The result of the encoding of a sequence of values using Smahat is a SmahatContent dictionary containing the encoded data, plus three fields : shift is used to bring the data range to start from zero (values are shifted and encoded in pure binary), n_bits_per_value indicates the number of bits used to encode each value, n_padding_bits (between 0 and 7) indicates the number of unused padding bits within the last byte.

class SmahatContent(TypedDict):
    encoded: bytes
    shift: int
    n_bits_per_value: int
    n_padding_bits: int

If you want to use this library for message exchanges, you can serialize the result of the encoding as you like (JSON, protobuf, etc.)

Contribute

$ git clone https://github.com/ghilesmeddour/smahat-time-series-encoding.git
$ cd smahat-time-series-compression
make format
make dead-code-check
make test
make type-check
make coverage
make build

TODOs

  • Add unit tests.
  • Improve doc.
You might also like...
Simple yet flexible natural sorting in Python.

natsort Simple yet flexible natural sorting in Python. Source Code: https://github.com/SethMMorton/natsort Downloads: https://pypi.org/project/natsort

A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!
A Python utility belt containing simple tools, a stdlib like feel, and extra batteries. Hashing, Caching, Timing, Progress, and more made easy!

Ubelt is a small library of robust, tested, documented, and simple functions that extend the Python standard library. It has a flat API that all behav

Simple python module to get the information regarding battery in python.
Simple python module to get the information regarding battery in python.

Battery Stats A python3 module created for easily reading the current parameters of Battery in realtime. It reads battery stats from /sys/class/power_

A simple Python app that generates semi-random chord progressions.

chords-generator A simple Python app that generates semi-random chord progressions.

A simple and easy to use Spam Bot made in Python!

This is a simple spam bot made in python. You can use to to spam anyone with anything on any platform.

Runes - Simple Cookies You Can Extend (similar to Macaroons)

Runes - Simple Cookies You Can Extend (similar to Macaroons) is a paper called "Macaroons: Cookies with Context

Simple collection of GTPS Flood in Python.

GTPS Flood Simple collection of GTPS Flood in Python. NOTE Give me credit if you use this source, don't trade/sell this tool, And USE AT YOUR OWN RISK

Astvuln is a simple AST scanner which recursively scans a directory, parses each file as AST and runs specified method.

Astvuln Astvuln is a simple AST scanner which recursively scans a directory, parses each file as AST and runs specified method. Some search methods ar

A set of Python scripts to surpass human limits in accomplishing simple tasks.

Human benchmark fooler Summary A set of Python scripts with Selenium designed to surpass human limits in accomplishing simple tasks available on https

Releases(v0.0.1)
Owner
Ghiles Meddour
Data Analyst at Munic
Ghiles Meddour
Dependency Injector is a dependency injection framework for Python.

What is Dependency Injector? Dependency Injector is a dependency injection framework for Python. It helps implementing the dependency injection princi

ETS Labs 2.6k Jan 04, 2023
✨ Un chois aléatoire d'un article sur Wikipedia totalement fait en Python par moi, et en français.

Wikipedia Random Article ❗ Un chois aléatoire d'un article sur Wikipedia totalement fait en Python par moi, et en français. 🔮 Grâce a une requète a w

MrGabin 4 Jul 18, 2021
Networkx with neo4j back-end

Dump networkx graph into nodes/relations TSV from neo4jnx.tsv import graph_to_tsv g = pklload('indranet_dir_graph.pkl') graph_to_tsv(g, 'docker/nodes.

Benjamin M. Gyori 1 Oct 27, 2021
Numbers-parser - Python module for parsing Apple Numbers .numbers files

numbers-parser numbers-parser is a Python module for parsing Apple Numbers .numbers files. It supports Numbers files generated by Numbers version 10.3

Jon Connell 154 Jan 05, 2023
Attempts to crack the compression puzzle.

The Compression Puzzle One lovely Friday we were faced with this nice yet intriguing programming puzzle. One shall write a program that compresses str

Oto Brglez 14 Dec 29, 2022
a simple function that randomly generates and applies console text colors

ChangeConsoleTextColour a simple function that randomly generates and applies console text colors This repository corresponds to my Python Functions f

Mariya 6 Sep 20, 2022
A fixture that allows runtime xfail

pytest-runtime-xfail pytest plugin, providing a runtime_xfail fixture, which is callable as runtime_xfail(), to allow runtime decisions to mark a test

Brian Okken 4 Apr 06, 2022
Build capture utility for Linux

CX-BUILD Compilation Database alternative Build Prerequisite the CXBUILD uses linux system call trace utility called strace which was customized. So I

GLaDOS (G? L? Automatic Debug Operation System) 3 Nov 03, 2022
Protect your eyes from eye strain using this simple and beautiful, yet extensible break reminder

Protect your eyes from eye strain using this simple and beautiful, yet extensible break reminder

Gobinath 1.2k Jan 01, 2023
A python app which aggregates and splits costs from multiple public cloud providers into a csv

Cloud Billing This project aggregates the costs public cloud resources by accounts, services and tags by importing the invoices from public cloud prov

1 Oct 04, 2022
A clock app, which helps you with routine tasks.

Clock This app helps you with routine tasks. Alarm Clock Timer Stop Watch World Time (Which city you want) About me Full name: Matin Ardestani Age: 14

Matin Ardestani 13 Jul 30, 2022
Dependency injection lib for Python 3.8+

PyDI Dependency injection lib for python How to use To define the classes that should be injected and stored as bean use decorator @component @compone

Nikita Antropov 2 Nov 09, 2021
Bounding Boxes Python Utils

Bounding Boxes Python Utils

Vadim 4 May 01, 2022
An awesome tool to save articles from RSS feed to Pocket automatically.

RSS2Pocket An awesome tool to save articles from RSS feed to Pocket automatically. About the Project I used to use IFTTT to save articles from RSS fee

Hank Liao 10 Nov 12, 2022
Two fast AUC calculation implementations for python

fastauc Two fast AUC calculation implementations for python: python-based is approximately 5X faster than the default sklearn.metrics.roc_auc_score()

Vsevolod Kompantsev 26 Dec 11, 2022
Python based utilities for interacting with digital multimeters that are built on the FS9721-LP3 chipset.

Python based utilities for interacting with digital multimeters that are built on the FS9721-LP3 chipset.

Fergus 1 Feb 02, 2022
Finds price floor for every single attribute in a given collection

Solana Solanart Scanner Enjoy the Free Code Steps to run Download VS Code

Dalton Nisbett 19 Oct 20, 2022
Analyze metadata of your Python project.

Analyze metadata of your Python projects Setup: Clone repo py-m venv venv (venv) pip install -r requirements.txt specify the folders which you want to

Pedro Monteiro de Carvalho e Silva Prado 1 Nov 10, 2021
Easy compression and extraction for any compression or archival format.

Tzar: Tar, Zip, Anything Really Easy compression and extraction for any compression or archival format. Usage/Examples tzar compress large-dir compres

DanielVZ 37 Nov 02, 2022
Dice Rolling Simulator using Python-random

Dice Rolling Simulator As the name of the program suggests, we will be imitating a rolling dice. This is one of the interesting python projects and wi

PyLaboratory 1 Feb 02, 2022