A different spin on dataclasses.

Last update: Nov 18, 2022

Related tags

Overview

dataklasses

Dataklasses is a library that allows you to quickly define data classes using Python type hints. Here's an example of how you use it:

from dataklasses import dataklass

@dataklass
class Coordinates:
    x: int
    y: int

The resulting class works in a well civilised way, providing the usual __init__(), __repr__(), and __eq__() methods that you'd normally have to type out by hand:

>>> a = Coordinates(2, 3)
>>> a
Coordinates(2, 3)
>>> a.x
2
>>> a.y
3
>>> b = Coordinates(2, 3)
>>> a == b
True
>>>

It's easy! Almost too easy.

Wait, doesn't this already exist?

No, it doesn't. Yes, certain naysayers will be quick to point out the existence of @dataclass from the standard library. Ok, sure, THAT exists. However, it's slow and complicated. Dataklasses are neither of those things. The entire dataklasses module is less than 100 lines. The resulting classes import 15-20 times faster than dataclasses. See the perf.py file for a benchmark.

Theory of Operation

While out walking with his puppy, Dave had a certain insight about the nature of Python byte-code. Coming back to the house, he had to try it out:

>>> def __init1__(self, x, y):
...     self.x = x
...     self.y = y
...
>>> def __init2__(self, foo, bar):
...     self.foo = foo
...     self.bar = bar
...
>>> __init1__.__code__.co_code == __init2__.__code__.co_code
True
>>>

How intriguing! The underlying byte-code is exactly the same even though the functions are using different argument and attribute names. Aha! Now, we're onto something interesting.

The dataclasses module in the standard library works by collecting type hints, generating code strings, and executing them using the exec() function. This happens for every single class definition where it's used. If it sounds slow, that's because it is. In fact, it defeats any benefit of module caching in Python's import system.

Dataklasses are different. They start out in the same manner--code is first generated by collecting type hints and using exec(). However, the underlying byte-code is cached and reused in subsequent class definitions whenever possible.

A Short Story

Once upon a time, there was this programming language that I'll refer to as "Lava." Anyways, anytime you started a program written in Lava, you could just tell by the awkward silence and inactivity of your machine before the fans kicked in. "Ah shit, this is written in Lava" you'd exclaim.

Questions and Answers

Q: What methods does dataklass generate?

A: By default __init__(), __repr__(), and __eq__() methods are generated. __match_args__ is also defined to assist with pattern matching.

Q: Does dataklass enforce the specified types?

A: No. The types are merely clues about what the value might be and the Python language does not provide any enforcement on its own.

Q: Are there any additional features?

A: No. You can either have features or you can have performance. Pick one.

Q: Does dataklass use any advanced magic such as metaclasses?

A: No.

Q: How do I install dataklasses?

A: There is no setup.py file, installer, or an official release. You install it by copying the code into your own project. dataklasses.py is small. You are encouraged to modify it to your own purposes.

Q: But what if new features get added?

A: What new features? The best new features are no new features.

Q: Who maintains dataklasses?

A: If you're using it, you do. You maintain dataklasses.

Q: Who wrote this?

A: dataklasses is the work of David Beazley. http://www.dabeaz.com.

A different spin on dataclasses.

Related tags

Overview

dataklasses

Wait, doesn't this already exist?

Theory of Operation

A Short Story

Questions and Answers

Owner

David Beazley

Image-popularity-score - A novel deep regression method for image scoring.

Crossover Learning for Fast Online Video Instance Segmentation (ICCV 2021)

Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs, ICCV 2021

Code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition"

Implementation of SiameseXML (ICML 2021)

MPLP: Metapath-Based Label Propagation for Heterogenous Graphs

Efficient Two-Step Networks for Temporal Action Segmentation (Neurocomputing 2021)

DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

A collection of 100 Deep Learning images and visualizations

TorchXRayVision: A library of chest X-ray datasets and models.

An ML & Correlation platform for transforming disparate data points of interest into usable intelligence.

Object DGCNN and DETR3D, Our implementations are built on top of MMdetection3D.

TensorFlow implementation for Bayesian Modeling and Uncertainty Quantification for Learning to Optimize: What, Why, and How

This program automatically runs Python code copied in clipboard

Isaac Gym Reinforcement Learning Environments

Source code for CIKM 2021 paper for Relation-aware Heterogeneous Graph for User Profiling

MakeItTalk: Speaker-Aware Talking-Head Animation

Pose Detection and Machine Learning for real-time body posture analysis during exercise to provide audiovisual feedback on improvement of form.

Reproduction of Vision Transformer in Tensorflow2. Train from scratch and Finetune.

一个运行在 𝐞𝐥𝐞𝐜𝐕𝟐𝐏 或 𝐪𝐢𝐧𝐠𝐥𝐨𝐧𝐠 等定时面板的签到项目