Feature engineering library that helps you keep track of feature dependencies, documentation and schema

Overview

featureclass

Feature engineering library that helps you keep track of feature dependencies, documentation and schema

This library helps define a featureclass.
featureclass is inspired by dataclass, and is meant to provide alternative way to define features engineering classes.

I have noticed that the below code is pretty common when doing feature engineering:

from statistics import variance
from math import sqrt
class MyFeatures:
    def calc_all(self, datapoint):
        out = {}
        out['var'] = self.calc_var(datapoint),
        out['stdev'] = self.calc_std(out['var'])
        return out
        
    def calc_var(self, data) -> float:
        return variance(data)

    def calc_stdev(self, var) -> float:
        return sqrt(var)

Some things were missing for me from this type of implementation:

  1. Implicit dependencies between features
  2. No simple schema
  3. No documentation for features
  4. Duplicate declaration of the same feature - once as a function and one as a dict key

This is why I created this library.
I turned the above code into this:

float: """Calc stdev""" return sqrt(self.var) print(feature_names(MyFeatures)) # ('var', 'stdev') print(feature_annotations(MyFeatures)) # {'var': float, 'stdev': float} print(asDict(MyFeatures([1,2,3,4,5]))) # {'var': 2.5, 'stdev': 1.5811388300841898} print(asDataclass(MyFeatures([1,2,3,4,5]))) # MyFeatures(stdev=1.5811388300841898, var=2.5) ">
from featureclass import feature, featureclass, feature_names, feature_annotations, asDict, asDataclass
from statistics import variance
from math import sqrt

@featureclass
class MyFeatures:
    def __init__(self, datapoint):
        self.datapoint = datapoint
    
    @feature()
    def var(self) -> float:
        """Calc variance"""
        return variance(self.datapoint)

    @feature()
    def stdev(self) -> float:
        """Calc stdev"""
        return sqrt(self.var)

print(feature_names(MyFeatures)) # ('var', 'stdev')
print(feature_annotations(MyFeatures)) # {'var': float, 'stdev': float}
print(asDict(MyFeatures([1,2,3,4,5]))) # {'var': 2.5, 'stdev': 1.5811388300841898}
print(asDataclass(MyFeatures([1,2,3,4,5]))) # MyFeatures(stdev=1.5811388300841898, var=2.5)

The feature decorator is using cached_property to cache the feature calculation,
making sure that each feature is calculated once per datapoint

You might also like...
ChainJacking is a tool to find which of your Go lang direct GitHub dependencies is susceptible to ChainJacking attack.
ChainJacking is a tool to find which of your Go lang direct GitHub dependencies is susceptible to ChainJacking attack.

ChainJacking is a tool to find which of your Go lang direct GitHub dependencies is susceptible to ChainJacking attack.

Bazel rules to install Python dependencies with Poetry

rules_python_poetry Bazel rules to install Python dependencies from a Poetry project. Works with native Python rules for Bazel. Getting started Add th

An assistant to guess your pip dependencies from your code, without using a requirements file.

Pip Sala Bim is an assistant to guess your pip dependencies from your code, without using a requirements file. Pip Sala Bim will tell you which packag

Repls goes to sleep due to inactivity, but to keep it awake, simply host a webserver and ping it.
Repls goes to sleep due to inactivity, but to keep it awake, simply host a webserver and ping it.

Repls goes to sleep due to inactivity, but to keep it awake, simply host a webserver and ping it. This repo will help you make a webserver with a bit of console controls.

A Puzzle A Day Keep the Work Away

A Puzzle A Day Keep the Work Away No moyu again!

Keep your company's passwords behind the firewall

TeamVault TeamVault is an open-source web-based shared password manager for behind-the-firewall installation. It requires Python 3.3+ and Postgres (wi

A 100% python file organizer. Keep your computer always organized!

PythonOrganizer A 100% python file organizer. Keep your computer always organized! To run the project, just clone the folder and run the installation

School helper, helps you at your pyllabus's.
School helper, helps you at your pyllabus's.

pyllabus, helps you at your syllabus's... WARNING: It won't run without config.py! You should add config.py yourself, it will include your APIKEY. e.g

Ssma is a tool that helps you collect your badges in a satr platform
Ssma is a tool that helps you collect your badges in a satr platform

satr-statistics-maker ssma is a tool that helps you collect your badges in a satr platform ๐ŸŽ–๏ธ Requirements python = 3.7 Installation first clone the

Releases(0.3.0)
  • 0.3.0(Jan 19, 2022)

    What's Changed

      • rename asDict to as_dict and asDataclass to as_dataclass by @Itayazolay in https://github.com/Itayazolay/featureclass/pull/4

    Full Changelog: https://github.com/Itayazolay/featureclass/compare/0.2.1...0.3.0

    Source code(tar.gz)
    Source code(zip)
  • 0.2.1(Jan 11, 2022)

    What's Changed

    • docs, formatting and tests by @Itayazolay in https://github.com/Itayazolay/featureclass/pull/1
    • fine-tune mypy type ignore by @Itayazolay in https://github.com/Itayazolay/featureclass/pull/3

    New Contributors

    • @Itayazolay made their first contribution in https://github.com/Itayazolay/featureclass/pull/1

    Full Changelog: https://github.com/Itayazolay/featureclass/compare/0.2.0...0.2.1

    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Jan 9, 2022)

A Python wrapper API for operating and working with the Neo4j Graph Data Science (GDS) library

gdsclient NOTE: This is a work in progress and many GDS features are known to be missing or not working properly. This repo hosts the sources for gdsc

Neo4j 100 Dec 20, 2022
ColabFold / AlphaFold2_advanced on your local PC (or macOS)

LocalColabFold ColabFold / AlphaFold2_advanced on your local PC (or macOS) Installation For Linux Make sure curl and wget commands are already install

Yoshitaka Moriwaki 207 Dec 22, 2022
Improved version calculator, now using while True and etc

CalcuPython_2.0 Olรก! Calculadora versรฃo melhorada, agora usando while True e etc... melhorei o design e os carai tudo (rode no terminal, pra melhor ex

Scott 2 Jan 27, 2022
run-js Goal: The Easiest Way to Run JavaScript in Python

run-js Goal: The Easiest Way to Run JavaScript in Python features Stateless Async JS Functions No Intermediary Files Functional Programming CommonJS a

Daniel J. Dufour 9 Aug 16, 2022
Simple web application, which has a single endpoint, dedicated to annotation parsing and convertion.

Simple web application, which has a single endpoint, dedicated to annotation parsing and conversion.

Pavel Paranin 1 Nov 01, 2021
List of short Codeforces problems with a statement of 1000 characters or less. Python script and data files included.

Shortest problems on Codeforces List of Codeforces problems with a short problem statement of 1000 characters or less. Sorted for each rating level. B

32 Dec 24, 2022
This tool helps you to reverse any regex and gives you the opposite/allowed Letters,numerics and symbols.

Regex-Reverser This tool helps you to reverse any regex and gives you the opposite/allowed Letters,numerics and symbols. Screenshots Usage/Examples py

x19 0 Jun 02, 2022
python DroneCAN code generation, interface and utilities

UAVCAN v0 stack in Python Python implementation of the UAVCAN v0 protocol stack. UAVCAN is a lightweight protocol designed for reliable communication

DroneCAN 11 Dec 12, 2022
Simple tooling for marking deprecated functions or classes and re-routing to the new successors' instance.

pyDeprecate Simple tooling for marking deprecated functions or classes and re-routing to the new successors' instance

Jirka Borovec 45 Nov 24, 2022
This is an independent project to track Nubank expenses

Nubank expense tracker This is an independent project to track Nubank expenses. To fetch Nubank data we are going to use an unofficial Nubank API, tha

Ramon Gazoni Lacerda 0 Aug 28, 2022
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python

Scalene: a high-performance CPU, GPU and memory profiler for Python by Emery Berger, Sam Stern, and Juan Altmayer Pizzorno. Scalene community Slack Ab

PLASMA @ UMass 7k Dec 30, 2022
script buat mengcrack

setan script buat mengcrack cara install $ pkg install upgrade && pkg update $ pkg install python $ pkg install git $ pip install requests $ pip insta

1 Nov 03, 2021
๐ŸŽ‰ ๐ŸŽ‰ PyComp - Java Code compiler written in python.

๐ŸŽ‰ ๐ŸŽ‰ PyComp Java Code compiler written in python. This is yet another compiler meant for babcock students project which was created using pure python

Alumona Benaiah 5 Nov 30, 2022
A python script to decrypt media files encrypted using the Android application 'Secret Calculator Photo Vault'. Supports brute force of PIN also.

A python script to decrypt media files encrypted using the Android application 'Secret Calculator Photo Vault'. Supports brute force of PIN also.

3 May 01, 2022
Sorter makes file organisation and management easier.

Sorter Sorter makes file organisation easier. It simply helps you organise several files that contain similar characteristics into a single folder. Yo

Aswa Paul 34 Aug 14, 2022
Battery conservation Python script for ubuntu to enable battery conservation mode at 60% 80% or 90%

Description Batteryconservation is a small python script wich creates an appindicator for ubuntu which can be used to enable / disable battery conserv

3 Jan 04, 2022
Basic repository showing how to use Hydra + Hydra launchers on SLURM cluster

Slurm-Hydra-Submitit This repository is a minimal working example on how to: setup Hydra setup batch of slurm jobs on top of Hydra via submitit-launch

Raphael Meudec 2 Jul 25, 2022
A simple python script where the user inputs the current ingredients they have in their kitchen into ingredients.txt

A simple python script where the user inputs the current ingredients they have in their kitchen into ingredients.txt and then runs the main.py script, and it will output what recipes can be created b

Jordan Leich 3 Nov 02, 2022
Turn your IPad into a Screen-Slaver with 1 simple Pythonista script

ScreenSlaver Turn your IPad into a Screen-Slaver with 1 simple Pythonista script

6 Jul 09, 2022
0xFalcon - 0xFalcon Tool For Python

0xFalcone Installation Install 0xFalcone Tool: apt install git git clone https:/

Alharb7 6 Sep 24, 2022