Tuple-sum-filter - Library to play with filtering numeric sequences by sums of their pairs, triplets, etc. With a bonus CLI demo

Overview

Tuple Sum Filter

A library to play with filtering numeric sequences by sums of their pairs, triplets, etc.

Comes with a bonus CLI to demo the functionality.

Requires (and CI tests on) python 3.8 to 3.10. If you need to use python 3.7 then try replacing math.prod(some_iterable) with functools.reduce(lambda x, y: x * y, some_iterable)

Approach

We're thinking of this mostly as a library with the CLI as only for demo purposes. Ways you can see this in the code:

  • logging should really handled by the consumer,
    • our get_logger should be something that is passed into the lib
  • the CLI is pretty light on automated tests
  • we use pretty loose production dependency pinning
    • rather than pip freeze > requirements.txt of a deployed app
    • we want to keep things loose so that consumers can keep installing us alongside other things
    • we should probably set up tox/nox test runs against v.latest of our dependencies

Running the demo

in a fresh virtualenv (python>=3.8)

# install project and deps
pip install git+https://github.com/lbillingham/tuple-sum-filter.git

# create a suitable input file
echo "1721\n979\n366\n299\n675\n1456\n" > example.txt

# run the demo
filter_demo --input_file=example.txt --sum_target=2020 --dimension=2

you should see output like

checking for pairs of numbers that sum to 2020 in example.txt
Results pair (1721, 299) match: sum to 2020 and multiply to 514579

Consuming the library

The main filtering functions are pairs_that_sum_to and triplets_that_sum_to. They both have signatures (numbers: Sequence[int|float], sum_target: int|float) -> things_that_passed_the_filter list[tuple].

There is also a file-reading helper numbers_in_file exported at the top level.

Developing

Run the following to install the project (and dev dependencies) into your active virtualenv:

make dev_install

day-to-day development tasks can be orchestrated via make

  • dependency management
  • test/lint/typecheck running
  • coverage reporting
  • run make without any arguments to see a list

There is a CI suite which runs lint and test on several python versions. We don't run typechecking as a gate in CI because we think that turns a sometimes-useful tool into a Goodhart target.

Performance

We have not been optimizing for performance and it kind of shows.

When we run the benchmarking suite we see ~0.4 seconds fairly consistently for the triplet/3D problem.

We have at least 3 ideas of how to speed things up: several of them include dropping floating-point support.

$ make benchmark

tests/performance_check.py ..                                                                                                                                [100%]


------------------------------------------------------------------------------------- benchmark: 2 tests ------------------------------------------------------------------------------------
Name (time in ms)             Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_input1_pairs          5.4665 (1.0)        6.2297 (1.0)        5.6687 (1.0)      0.1018 (1.0)        5.6575 (1.0)      0.1289 (1.0)          47;3  176.4077 (1.0)         172           1
test_input1_triplets     384.6154 (70.36)    386.5000 (62.04)    385.4776 (68.00)    0.8287 (8.14)     385.4333 (68.13)    1.5047 (11.67)         2;0    2.5942 (0.01)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--

🍪 ✂️ cookiecut from lbillingham's python-cli-template

You might also like...
Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids
Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids

codon_optimize_cds_with_many_taxids_singlefasta Linux GUI app to codon optimize many single-fasta files with coding sequences, using many taxonomy ids

Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a local folder

Ingestinator Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a

Python Common things by Problem Fighter Library, (Exception, Debug Log, etc.)

In the name of God, the Most Gracious, the Most Merciful. PF-PY-Common Documentation Install and update using pip: pip install -U xxxx Please find the

Devil - Very Semple Auto Filter V1 Bot
Devil - Very Semple Auto Filter V1 Bot

Devil Very Semple Auto Filter V1 Bot

Cairo-bloom - A naive bloom filter implementation in Cairo

🥀 cairo-bloom A naive bloom filter implementation in Cairo. A Bloom filter is a

Snakemake worflow to process and filter long read data from Oxford Nanopore Technologies.
Snakemake worflow to process and filter long read data from Oxford Nanopore Technologies.

Nanopore-Workflow Snakemake workflow to process and filter long read data from Oxford Nanopore Technologies. It is designed to compare whole human gen

Runnable Python demo of ArtLine

artline-demo How to run? pip3 install -r requirements.txt python3 app.py How to use? Run the Flask app Open localhost:5000 in browser Select an image(

Tiny demo site for exploring SameSite=Lax

samesite-lax-demo Background on my blog: Exploring the SameSite cookie attribute for preventing CSRF This repo holds some tools for exploring the impl

An extended version of the hotkeys demo code using action classes

An extended version of the hotkeys application using action classes. In adafruit's Hotkeys code, a macro is using a series of integers, assumed to be

Comments
  • Perf: :zap:  merge if you want to go faster but don't need float support

    Perf: :zap: merge if you want to go faster but don't need float support

    This moves away from the shared itertools implimentations for finding pairs, triplets of the input numbers that sum to a given target.

    Instead, we

    • 1st expose the underlying $~O^{dimensions}$ nested loops
    • trade some extra memory and some $O^{1}$ lookups to give us $~O^{dimensions-1}$
    • get a >= 170x speedup in our benchmarks

    However, we:

    • loose the ability to properly work with floating point input
      • the fast lookup uses hasing and hashing floats gets weird due to floating point equality
    • can't trivially extend to higher-dimension problems: 4-element-tuples etc.

    I've moved the float input tests out the their own file and away from the CI test path

    Performance benchmarks

    with these changes:

    $ make benchmark
    tests/performance_check.py ..                                                                                                                             [100%]
    
    -------------------------------------------------------------------------------------------- benchmark: 2 tests --------------------------------------------------------------------------------------------
    Name (time in us)               Min                   Max                  Mean              StdDev                Median                 IQR            Outliers          OPS            Rounds  Iterations
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    test_input1_pairs           22.1660 (1.0)        166.3790 (1.0)         23.6089 (1.0)        5.0170 (1.0)         23.0000 (1.0)        0.5123 (1.0)        87;521  42,356.8183 (1.0)        7677           1
    test_input1_triplets     1,994.8000 (89.99)    3,561.1120 (21.40)    2,152.1272 (91.16)    204.1428 (40.69)    2,033.1040 (88.40)    299.1878 (584.07)       41;4     464.6565 (0.01)        341           1
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    

    itertools, but non-float supporting version

    $ make benchmark
    tests/performance_check.py ..                                                                                                      [100%]
    
    ------------------------------------------------------------------------------------- benchmark: 2 tests ------------------------------------------------------------------------------------
    Name (time in ms)             Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    test_input1_pairs          2.8727 (1.0)        4.2386 (1.0)        3.1265 (1.0)      0.1638 (1.0)        3.1067 (1.0)      0.1888 (1.0)          78;9  319.8414 (1.0)         326           1
    test_input1_triplets     211.6325 (73.67)    213.3950 (50.35)    212.4042 (67.94)    0.6555 (4.00)     212.2717 (68.33)    0.8081 (4.28)          2;0    4.7080 (0.01)          5           1
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    =========================================================== 2 passed in 3.59s ============================================================
    

    Note the change in units this branch is in microseconds, the itertools version is in milliseconds

    opened by lbillingham 0
Releases(v0.0.1)
  • v0.0.1(Feb 17, 2022)

    Initial release. Lib allows filtering by sum over pairs and triplets of numbers loaded from a local file. Plus a bonus CLI app that can be used for demoing the lib.

    Solution is itertools-y and rather slow (probably $O^{n}$ where pairs->n=2 and triplets->n=3).

    This is the version shown to MM

    Source code(tar.gz)
    Source code(zip)
Owner
Laurence Billingham
+ sustainable software + data science
Laurence Billingham
Application to list countries in order of travel from the United States.

Application to list countries in order of travel from the United States.

Broden Wanner 1 Nov 03, 2021
Msgpack serialization/deserialization library for Python, written in Rust using PyO3 and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

Aviram Hassan 139 Dec 30, 2022
A 3D Slicer Extension to view data from the flywheel heirarchy

flywheel-connect A 3D Slicer Extension to view, select, and download images from a Flywheel instance to 3D Slicer and storing Slicer outputs back to F

4 Nov 05, 2022
ELF file deserializer and serializer library

elfo ELF file deserializer and serializer library. import elfo elf = elfo.ELF.from_path('main') elf ELF( header=ELFHeader( e_ident=e

Filipe Laíns 3 Aug 23, 2021
For when you really need to rank things

Comparisonator For when you really need to rank things. Do you know that feeling when there's this urge deep within you that tells you to compare thin

Maciej Wilczyński 1 Nov 01, 2021
Desenvolvendo as habilidades básicas de programação visando a construção de aplicativos por meio de bibliotecas apropriadas à Ciência de Dados.

Algoritmos e Introdução à Computação Ementa: Conceitos básicos sobre algoritmos e métodos para sua construção. Tipos de dados e variáveis. Estruturas

Dyanna Cruz 1 Jan 06, 2022
Advanced Variable Manager {AVM} [0.8.0]

Advanced Variable Manager {AVM} [0.8.0] By Grosse pastèque#6705 WARNING : This modules need some typing modifications ! If you try to run it without t

Big watermelon 1 Dec 11, 2021
Button paginator using discord_components

Button Paginator With discord-components Button paginator using discord_components Welcome! It's a paginator for discord-componets! Thanks to the orig

Decave 7 Feb 12, 2022
My qtile config with a fresh-looking bar and pywal support

QtileConfig My qtile config with a fresh-looking bar and pywal support. Note: This is my first rice and first github repo. Please excuse my poor codin

Eden 4 Nov 10, 2021
Mannaggia is a python application to praise or more likely to curse the saints

Mannaggia-py 👼 Remember Mannaggia? This is a Python remake of it, with new features. mannaggia is a python application to praise or more likely to cu

Christian Visintin 9 Aug 12, 2022
automate some stuff so I can be more noob

dota automate some stuff so I can be more noob This is a simple project, but one that I've wanted forever! I use pyautogui, time, smtplib and datetime

Aaron Allen 17 Oct 18, 2022
Apilytics for Python - Easy API analytics for Python backends

apilytics-python Installation Sign up and get your API key from https://apilytics.io - we offer a completely free trial with no credit card required!

Apilytics 6 Sep 29, 2022
A python library what works with numbers.

pynum A python library what works with numbers. Prime Prime class have everithing you want about prime numbers. check_prime The check_prime method is

Mohammad Mahdi Paydar Puya 1 Jan 07, 2022
Iss-tracker - ISS tracking script in python using NASA's API

ISS Tracker Tracking International Space Station using NASA's API and plotting i

Partho 9 Nov 29, 2022
Organize seu linux - organize your linux

OrganizeLinux Organize seu linux - organize your linux Organize seu linux Uma forma rápida de separar arquivos dispersos em pastas. formatos a serem c

Marcus Vinícius Ribeiro Andrade 1 Nov 30, 2021
urlwatch is intended to help you watch changes in webpages and get notified of any changes.

urlwatch is intended to help you watch changes in webpages and get notified (via e-mail, in your terminal or through various third party services) of any changes.

Thomas Perl 2.5k Jan 08, 2023
Webcash is an experimental e-cash (electronic cash)

Webcash Webcash is an experimental new electronic cash ("e-cash") that enables decentralized and instant payments to anyone, anywhere in the world. Us

Mark Friedenbach 0 Feb 26, 2022
COVID-19 case tracker in Dash

covid_dashy_personal This is a personal project to build a simple COVID-19 tracker for Australia with Dash. Key functions of this dashy will be to Dis

Jansen Zhang 1 Nov 30, 2021
Software for visualization of RTStruct structures on CT images

This script is responsible for the operation of the program, it is responsible for both creating the GUI and the process of processing images from dicom files. The program is based on the use of the

Adam Piszczek 0 Jun 29, 2022
Process GPX files (adding sensor metrics, uploading to InfluxDB, etc.) exported from imxingzhe.com

Xingzhe GPX Processor 行者轨迹处理工具 Xingzhe sells cheap GPS bike meters with sensor support including cadence, heart rate and power. But the GPX files expo

Shengqi Chen 8 Sep 23, 2022