Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

Last update: Nov 29, 2022

Overview

👩‍✈️ Coqpit

Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

Work in progress... 🌡️

❔ Why I need this

What I need from a ML configuration library...

Fixing a general config schema in Python to guide users about expected values.

Python is good but not universal. Sometimes you train a ML model and use it on a different platform. So, you need your model configuration file importable by other programming languages.
Simple dynamic value and type checking with default values.

If you are a beginner in a ML project, it is hard to guess the right values for your ML experiment. Therefore it is important to have some default values and know what range and type of input are expected for each field.
Ability to decompose large configs.

As you define more fields for the training dataset, data preprocessing, model parameters, etc., your config file tends to get quite large but in most cases, they can be decomposed, enabling flexibility and readability.
Inheritance and nested configurations.

Simply helps to keep configurations consistent and easier to maintain.
Ability to override values from the command line when necessary.

For instance, you might need to define a path for your dataset, and this changes for almost every run. Then the user should be able to override this value easily over the command line.

It also allows easy hyper-parameter search without changing your original code. Basically, you can run different models with different parameters just using command line arguments.
Defining dynamic or conditional config values.

Sometimes you need to define certain values depending on the other values. Using python helps to define the underlying logic for such config values.
No dependencies

You don't want to install a ton of libraries for just configuration management. If you install one, then it is better to be just native python.

🔍 Examples

👉 Serialization

import os
from dataclasses import asdict, dataclass, field
from coqpit import Coqpit, check_argument
from typing import List, Union


@dataclass
class SimpleConfig(Coqpit):
    val_a: int = 10
    val_b: int = None
    val_c: str = "Coqpit is great!"

    def check_values(self,):
        '''Check config fields'''
        c = asdict(self)
        check_argument('val_a', c, restricted=True, min_val=10, max_val=2056)
        check_argument('val_b', c, restricted=True, min_val=128, max_val=4058, allow_none=True)
        check_argument('val_c', c, restricted=True)


@dataclass
class NestedConfig(Coqpit):
    val_d: int = 10
    val_e: int = None
    val_f: str = "Coqpit is great!"
    sc_list: List[SimpleConfig] = None
    sc: SimpleConfig = SimpleConfig()
    union_var: Union[List[SimpleConfig], SimpleConfig] = field(default_factory=lambda: [SimpleConfig(),SimpleConfig()])

    def check_values(self,):
        '''Check config fields'''
        c = asdict(self)
        check_argument('val_d', c, restricted=True, min_val=10, max_val=2056)
        check_argument('val_e', c, restricted=True, min_val=128, max_val=4058, allow_none=True)
        check_argument('val_f', c, restricted=True)
        check_argument('sc_list', c, restricted=True, allow_none=True)
        check_argument('sc', c, restricted=True, allow_none=True)


if __name__ == '__main__':
    file_path = os.path.dirname(os.path.abspath(__file__))
    # init 🐸 dataclass
    config = NestedConfig()

    # save to a json file
    config.save_json(os.path.join(file_path, 'example_config.json'))
    # load a json file
    config2 = NestedConfig(val_d=None, val_e=500, val_f=None, sc_list=None, sc=None, union_var=None)
    # update the config with the json file.
    config2.load_json(os.path.join(file_path, 'example_config.json'))
    # now they should be having the same values.
    assert config == config2

    # pretty print the dataclass
    print(config.pprint())

    # export values to a dict
    config_dict = config.to_dict()
    # crate a new config with different values than the defaults
    config2 = NestedConfig(val_d=None, val_e=500, val_f=None, sc_list=None, sc=None, union_var=None)
    # update the config with the exported valuess from the previous config.
    config2.from_dict(config_dict)
    # now they should be having the same values.
    assert config == config2

👉 `argparse` handling and parsing.

import argparse
import os
from dataclasses import asdict, dataclass, field
from typing import List

from coqpit.coqpit import Coqpit, check_argument
import sys


@dataclass
class SimplerConfig(Coqpit):
    val_a: int = field(default=None, metadata={'help': 'this is val_a'})


@dataclass
class SimpleConfig(Coqpit):
    val_a: int = field(default=10,
                       metadata={'help': 'this is val_a of SimpleConfig'})
    val_b: int = field(default=None, metadata={'help': 'this is val_b'})
    val_c: str = "Coqpit is great!"
    mylist_with_default: List[SimplerConfig] = field(
        default_factory=lambda:
        [SimplerConfig(val_a=100),
         SimplerConfig(val_a=999)],
        metadata={'help': 'list of SimplerConfig'})

    # mylist_without_default: List[SimplerConfig] = field(default=None, metadata={'help': 'list of SimplerConfig'})  # NOT SUPPORTED YET!

    def check_values(self, ):
        '''Check config fields'''
        c = asdict(self)
        check_argument('val_a', c, restricted=True, min_val=10, max_val=2056)
        check_argument('val_b',
                       c,
                       restricted=True,
                       min_val=128,
                       max_val=4058,
                       allow_none=True)
        check_argument('val_c', c, restricted=True)


def main():
    file_path = os.path.dirname(os.path.abspath(__file__))

    # initial config
    config = SimpleConfig()
    print(config.pprint())

    # reference config that we like to match with the config above
    config_ref = SimpleConfig(val_a=222,
                              val_b=999,
                              val_c='this is different',
                              mylist_with_default=[
                                  SimplerConfig(val_a=222),
                                  SimplerConfig(val_a=111)
                              ])

    # create and init argparser with Coqpit
    parser = argparse.ArgumentParser()
    parser = config.init_argparse(parser)
    parser.print_help()
    args = parser.parse_args()

    # parse the argsparser
    config.from_argparse(args)
    config.pprint()
    # check the current config with the reference config
    assert config == config_ref


if __name__ == '__main__':
    sys.argv.extend(['--coqpit.val_a', '222'])
    sys.argv.extend(['--coqpit.val_b', '999'])
    sys.argv.extend(['--coqpit.val_c', 'this is different'])
    sys.argv.extend(['--coqpit.mylist_with_default.0.val_a', '222'])
    sys.argv.extend(['--coqpit.mylist_with_default.1.val_a', '111'])
    main()

🤸‍♀️ Merging coqpits

import os
from dataclasses import dataclass
from coqpit.coqpit import Coqpit, check_argument


@dataclass
class CoqpitA(Coqpit):
    val_a: int = 10
    val_b: int = None
    val_d: float = 10.21
    val_c: str = "Coqpit is great!"


@dataclass
class CoqpitB(Coqpit):
    val_d: int = 25
    val_e: int = 257
    val_f: float = -10.21
    val_g: str = "Coqpit is really great!"


if __name__ == '__main__':
    file_path = os.path.dirname(os.path.abspath(__file__))
    coqpita = CoqpitA()
    coqpitb = CoqpitB()
    coqpitb.merge(coqpita)
    print(coqpitb.val_a)
    print(coqpitb.pprint())

Comments

Allow file-like objects when saving and loading

Allow users to save the configs to arbitrary locations through file-like objects. Would e.g. simplify coqui-ai/TTS#683 without adding an fsspec dependency to this library.

opened by agrinh 6
Latest PR causes an issue when a `Serializable` has default None

https://github.com/coqui-ai/coqpit/blob/5379c810900d61ae19d79b73b03890fa103487dd/coqpit/coqpit.py#L539

@reuben I am on it but if you have an easy fix go for it. Right now it breaks all the TTS trainings.

opened by erogol 2
[feature request] change the `arg_perfix` of coqpit

Is it possible to change the arg_perfix when using Coqpit object to another value / empty string? I see the option is supported in the code by changing arg_perfix, but not sure how to access it using the proposed API.

Thanks for the package, looks very useful!

opened by mosheman5 1
Setup CI to push new tags to PyPI automatically

I'm gonna add a workflow to automatically upload new tags to PyPI. @erogol when you have a chance could you transfer the coqpit project on PyPI to the coqui user?[0] Then you can add your personal account as a maintainer also, so you don't have to change your local setup.

In the mean time I'll iterate on testpypi.

[0] https://pypi.org/user/coqui/

opened by reuben 1
Fix rsetattr

rsetattr() is updated to pass the new test cases below.

I don't know if it is the right solution. It might be that rsetattr confuses when coqpit is used as a prefix.

opened by erogol 0

[feature request] Warning when unexpected key is loaded but not present in class

Here is an toy scenario where it would be nice to have a warning

from dataclasses import dataclass
from coqpit import Coqpit

@dataclass
class SimpleConfig(Coqpit):
    val_a: int = 10
    val_b: int = None

if __name__ == "__main__":
    config = SimpleConfig()

    tmp_config = config.to_dict()
    tmp_config["unknown_key"] = "Ignored value"
    config.from_dict(tmp_config)
    print(config.to_json())

There the value of config.to_json() is

{
    "val_a": 10,
    "val_b": null
}

Which is expected behaviour, but we should get a warning that some keys were ignored (IMO)

feature request

opened by WeberJulian 6

[feature request] Add `is_defined`

Use coqpit.is_defined('field') to check if "field" in coqpit and coqpit.field is not None:

It is a common condition when you parse out a coqpit object.
feature request

opened by erogol 0
Allow grouping of argparse fields according to subclassing

When using inheritance to extend config definitions the resulting ArgumentParser has all fields flattened out. It would be nice to group fields by class and allow some control over ordering.

opened by reuben 2

Releases(v0.0.17)

v0.0.17(Dec 21, 2022)
What's Changed

Raise error when unhinted list by @erogol in https://github.com/coqui-ai/coqpit/pull/37

Full Changelog: https://github.com/coqui-ai/coqpit/compare/v0.0.16...v0.0.17
Source code(tar.gz)
Source code(zip)
v0.0.16(Apr 25, 2022)
What's Changed

Update README.md by @WeberJulian in https://github.com/coqui-ai/coqpit/pull/33

Deserialize using the default values if it exists by @Edresson in https://github.com/coqui-ai/coqpit/pull/35

New Contributors

@WeberJulian made their first contribution in https://github.com/coqui-ai/coqpit/pull/33

@Edresson made their first contribution in https://github.com/coqui-ai/coqpit/pull/35

Full Changelog: https://github.com/coqui-ai/coqpit/compare/v0.0.15...v0.0.16
Source code(tar.gz)
Source code(zip)
v0.0.15(Feb 16, 2022)
What's Changed

Improve argparse UI for boolean flags by @reuben in https://github.com/coqui-ai/coqpit/pull/24

Allow file-like objects when saving and loading by @agrinh in https://github.com/coqui-ai/coqpit/pull/15

Revert "Allow file-like objects when saving and loading" by @erogol in https://github.com/coqui-ai/coqpit/pull/32

Add python 3.10 to CI by @erogol in https://github.com/coqui-ai/coqpit/pull/31

Full Changelog: https://github.com/coqui-ai/coqpit/compare/v0.0.14...v0.0.15
Source code(tar.gz)
Source code(zip)
v0.0.14(Sep 3, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.13(Aug 30, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.12(Aug 26, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.11(Aug 25, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.10(Jun 23, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.9(Jun 4, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.8(May 26, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.8-alpha.1(May 20, 2021)

Testing automatic publish on release.
Source code(tar.gz)
Source code(zip)
v0.0.8-alpha.0(May 20, 2021)

Testing automatic publish on release.
Source code(tar.gz)
Source code(zip)

Owner

Eren Gölge

AI researcher @Coqui.ai

GitHub Repository

A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Demand-Forecasting Business Problem A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

3 Mar 06, 2022

Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir

6 Jun 30, 2022

A modular active learning framework for Python

Modular Active Learning framework for Python3 Page contents Introduction Active learning from bird's-eye view modAL in action From zero to one in a fe

1.9k Dec 31, 2022

MLflow App Using React, Hooks, RabbitMQ, FastAPI Server, Celery, Microservices

Katana ML Skipper This is a simple and flexible ML workflow engine. It helps to orchestrate events across a set of microservices and create executable

8 Nov 17, 2022

Reproducibility and Replicability of Web Measurement Studies

Reproducibility and Replicability of Web Measurement Studies This repository holds additional material to the paper "Reproducibility and Replicability

6 Dec 31, 2022

Decision Weights in Prospect Theory

Decision Weights in Prospect Theory It's clear that humans are irrational, but how irrational are they? After some research into behavourial economics

32 Nov 08, 2021

WAGMA-SGD is a decentralized asynchronous SGD for distributed deep learning training based on model averaging.

WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can b

6 Jun 18, 2022

Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

10 May 15, 2022

Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson curriculum all about Machine Learning

43.4k Jan 04, 2023

using Machine Learning Algorithm to classification AppleStore application

AppleStore-classification-with-Machine-learning-Algo- using Machine Learning Algorithm to classification AppleStore application. the first step : 1: p

2 May 02, 2022

Python module for data science and machine learning users.

dsnk-distributions package dsnk distribution is a Python module for data science and machine learning that was created with the goal of reducing calcu

1 Nov 23, 2021

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

28 Aug 03, 2022

MaD GUI is a basis for graphical annotation and computational analysis of time series data.

MaD GUI Machine Learning and Data Analytics Graphical User Interface MaD GUI is a basis for graphical annotation and computational analysis of time se

10 Dec 19, 2022

CyLP is a Python interface to COIN-OR’s Linear and mixed-integer program solvers (CLP, CBC, and CGL)

CyLP CyLP is a Python interface to COIN-OR’s Linear and mixed-integer program solvers (CLP, CBC, and CGL). CyLP’s unique feature is that you can use i

161 Dec 14, 2022

A collection of Machine Learning Models To Web Api which are built on open source technologies/frameworks like Django, Flask.

Author Ibrahim Koné From-Machine-Learning-Models-To-WebAPI A collection of Machine Learning Models To Web Api which are built on open source technolog

2 May 24, 2022

A basic Ray Tracer that exploits numpy arrays and functions to work fast.

Python-Fast-Raytracer A basic Ray Tracer that exploits numpy arrays and functions to work fast. The code is written keeping as much readability as pos

393 Dec 27, 2022

Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogramas anuais com spark, em pyspark e SQL!

Olá! Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogr

10 Apr 04, 2022

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and the A* Search (using the Manhattan Distance Heuristic)

17 Aug 14, 2022

Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them

Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them.

3 Apr 03, 2022

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment to test the algorithm

59 Dec 09, 2022

Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

Related tags

Overview

👩‍✈️ Coqpit

❔ Why I need this

🔍 Examples

👉 Serialization

👉 argparse handling and parsing.

🤸‍♀️ Merging coqpits

Comments

Releases(v0.0.17)

v0.0.17(Dec 21, 2022)

What's Changed

v0.0.16(Apr 25, 2022)

What's Changed

New Contributors

v0.0.15(Feb 16, 2022)

What's Changed

v0.0.14(Sep 3, 2021)

v0.0.13(Aug 30, 2021)

v0.0.12(Aug 26, 2021)

v0.0.11(Aug 25, 2021)

v0.0.10(Jun 23, 2021)

v0.0.9(Jun 4, 2021)

v0.0.8(May 26, 2021)

v0.0.8-alpha.1(May 20, 2021)

v0.0.8-alpha.0(May 20, 2021)

Owner

Eren Gölge

A chain of stores, 10 different stores and 50 different requests a 3-month demand forecast for its product.

Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

A modular active learning framework for Python

MLflow App Using React, Hooks, RabbitMQ, FastAPI Server, Celery, Microservices

Reproducibility and Replicability of Web Measurement Studies

Decision Weights in Prospect Theory

WAGMA-SGD is a decentralized asynchronous SGD for distributed deep learning training based on model averaging.

Automated Machine Learning Pipeline for tabular data. Designed for predictive maintenance applications, failure identification, failure prediction, condition monitoring, etc.

Azure Cloud Advocates at Microsoft are pleased to offer a 12-week, 24-lesson curriculum all about Machine Learning

using Machine Learning Algorithm to classification AppleStore application

Python module for data science and machine learning users.

NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

MaD GUI is a basis for graphical annotation and computational analysis of time series data.

CyLP is a Python interface to COIN-OR’s Linear and mixed-integer program solvers (CLP, CBC, and CGL)

A collection of Machine Learning Models To Web Api which are built on open source technologies/frameworks like Django, Flask.

A basic Ray Tracer that exploits numpy arrays and functions to work fast.

Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogramas anuais com spark, em pyspark e SQL!

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

👉 `argparse` handling and parsing.