A python library to build Model Trees with Linear Models at the leaves.

Last update: Dec 30, 2022

Overview

linear-tree

A python library to build Model Trees with Linear Models at the leaves.

Overview

Linear Model Trees combine the learning ability of Decision Tree with the predictive and explicative power of Linear Models. Like in tree-based algorithms, the data are split according to simple decision rules. The goodness of slits is evaluated in gain terms fitting Linear Models in the nodes. This implies that the models in the leaves are linear instead of constant approximations like in classical Decision Trees.

linear-tree is developed to be fully integrable with scikit-learn. LinearTreeRegressor and LinearTreeClassifier are provided as scikit-learn BaseEstimator. They are wrappers that build a decision tree on the data fitting a linear estimator from sklearn.linear_model. All the models available in sklearn.linear_model can be used as linear estimators.

Installation

pip install linear-tree

The module depends on NumPy, SciPy and Scikit-Learn (>=0.23.0). Python 3.6 or above is supported.

Media

Usage

Regression

from sklearn.linear_model import LinearRegression
from lineartree import LinearTreeRegressor
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=4,
                       n_informative=2, n_targets=1,
                       random_state=0, shuffle=False)
regr = LinearTreeRegressor(base_estimator=LinearRegression())
regr.fit(X, y)

Classification

from sklearn.linear_model import RidgeClassifier
from lineartree import LinearTreeClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=100, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)
clf = LinearTreeClassifier(base_estimator=RidgeClassifier())
clf.fit(X, y)

More examples in the notebooks folder.

Check the API Reference to see the parameter configurations and the available methods.

Examples

Show the model tree structure:

Linear Tree Regressor at work:

Linear Tree Classifier at work:

Extract and examine coefficients at the leaves:

Comments

finding breakpoint

Hello,

thank you for your nice tool. I am using the function LinearTreeRegressor to draw a continuous piecewise linear. It works well, I am wondering, is it possible to show the location (the coordinates) of the breakpoints?

thank you

opened by ZhengLiu1119 5
Allow the hyperparameter "max_depth = 0".
Thanks for the good library.

When using LinearTreeRegressor, I think that max_depth is often optimized by cross-validation.

This library allows max_depth in the range 1-20. However, depending on the dataset, simple linear regression may be suitable. Even in such a dataset, max_depth is forced to be 1 or more, so Simple Linear Regression cannot be applied properly with LinearTreeRegressor.

Of course, it is appropriate to use sklearn.linear_model.LinearRegression for such datasets.

My suggestion is to change to a program that uses base_estimator to perform regression when "max_depth = 0". With this change, LinearTreeRegressor can flexibly respond to both segmented regression and simple regression by changing hyperparameters.
opened by jckkvs 4

Error when running with multiple jobs: unexpected keyword argument 'target_offload'

I have been using your library for quite a while and am super happy with it. So first, thanks a lot!

Lately, I used my framework (which also uses your library) on modern many core server with many jobs. Worked fine. Now I have updated everything via pip and with 8 jobs on my MacBook, I got the following error.

This error does not occur when using only a single job (I pass the number of jobs to n_jobs).

I cannot nail the down the actual problem, but since it occurred right after the upgrade, I assume this might be the reason?

Am I doing something wrong here?

"""
Traceback (most recent call last):
  File "/Users/martin/opt/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker
    r = call_item()
  File "/Users/martin/opt/anaconda3/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/Users/martin/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File "/Users/martin/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__
    for func, args, kwargs in self.items]
  File "/Users/martin/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>
    for func, args, kwargs in self.items]
  File "/Users/martin/opt/anaconda3/lib/python3.7/site-packages/lineartree/_classes.py", line 56, in __call__
    with config_context(**self.config):
  File "/Users/martin/opt/anaconda3/lib/python3.7/contextlib.py", line 239, in helper
    return _GeneratorContextManager(func, args, kwds)
  File "/Users/martin/opt/anaconda3/lib/python3.7/contextlib.py", line 82, in __init__
    self.gen = func(*args, **kwds)
TypeError: config_context() got an unexpected keyword argument 'target_offload'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "compression_selection_pipeline.py", line 41, in <module>
    model_pipeline.learn_runtime_models(calibration_result_dir)
  File "/Users/martin/Programming/compression_selection_v3/hyrise_calibration/model_pipeline.py", line 670, in learn_runtime_models
    non_splitting_models("table_scan", table_scans)
  File "/Users/martin/Programming/compression_selection_v3/hyrise_calibration/model_pipeline.py", line 590, in non_splitting_models
    fitted_model = model_dict["model"].fit(X_train, y_train)
  File "/Users/martin/Programming/compression_selection_v3/hyrise_calibration/model_pipeline.py", line 209, in fit
    return self.regression.fit(X, y)
  File "/Users/martin/opt/anaconda3/lib/python3.7/site-packages/lineartree/lineartree.py", line 187, in fit
    self._fit(X, y, sample_weight)
  File "/Users/martin/opt/anaconda3/lib/python3.7/site-packages/lineartree/_classes.py", line 576, in _fit
    self._grow(X, y, sample_weight)
  File "/Users/martin/opt/anaconda3/lib/python3.7/site-packages/lineartree/_classes.py", line 387, in _grow
    loss=loss)
  File "/Users/martin/opt/anaconda3/lib/python3.7/site-packages/lineartree/_classes.py", line 285, in _split
    for feat in split_feat)
  File "/Users/martin/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 1056, in __call__
    self.retrieve()
  File "/Users/martin/opt/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 935, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/Users/martin/opt/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "/Users/martin/opt/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/Users/martin/opt/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
TypeError: config_context() got an unexpected keyword argument 'target_offload'

PS: I have already left a star. :D

opened by Bouncner 3

Option to specify features to use for splitting and for leaf models
Added two additional parameters:

split_features: Indices of features that can be used for splitting. Default all.

linear_features: Indices of features that are used by the linear models in the leaves. Default all except for categorical features

This implements a feature requested in https://github.com/cerlymarco/linear-tree/issues/2

Potential performance improvement: Currently the code still computes bins for all features and not only for those used for splitting.
opened by JonasRauch 3
Rationale for rounding during _parallel_binning_fit and _grow

I noticed that the implementations of _parallel_binning_fit and _grow internally round loss values to 5 decimal places. This makes the regression results dependent on the scale of the labels, as data with a lower natural loss value will result in many different splits of the data having the same loss when rounded to 5 decimal places. Is there a reason why this is the case?

This behavior can be observed by fitting a LinearTreeRegressor using the default loss function and multiplying the scale of the labels by a small number (like 1e-9). This will result in the regressor no longer learning any splits.

opened by session-id 2
ValueError: Invalid parameter linearforestregression for estimator Pipeline

Great work! I'm new to ML and stuck with this. I'm trying to combine pipeline and GridSearch to search for best possible hyperparameters for a model.

I got the following error:

Kindly help : )

opened by NousMei 2
Performance and possibility to split only on subset of features

Hey, I have been playing around a lot with your linear trees. Like them very much. Thanks!

Nevertheless, I am somewhat disappointed by the runtime performance. Compared to XGBoost Regressors (I know it's not a fair comparison) or linear regressions (also not fair), the linear tree is reeeeeaally slow. 50k observations, 80 features: 2s for linear regression, 27s for XGBoost, and 300s for the linear tree. Have you seen similar runtimes or might I be using it wrong?

Another aspects that's interesting to me is the question whether is possibe to limit the features which are used for splits. I haven't found it in the code. Any change to see it in the future?

opened by Bouncner 2
export to graphviz -AttributeError: 'LinearTreeRegressor' object has no attribute 'n_features_'

Hi

thanks for writing this great package!

I was trying to display the decision tree with graphviz I get this error

AttributeError: 'LinearTreeRegressor' object has no attribute 'n_features_'

from lineartree import LinearTreeRegressor from sklearn.linear_model import LinearRegression

reg = LinearTreeRegressor(base_estimator=LinearRegression()) reg.fit(train[x_cols], train["y"])

from graphviz import Source from sklearn import tree

graph = Source( tree.export_graphviz(reg, out_file=None,feature_names=train.columns))

opened by ricmarchao 2
numpy deprecation warning

/lineartree/_classes.py:338: DeprecationWarning:

the interpolation= argument to quantile was renamed to method=, which has additional options. Users of the modes 'nearest', 'lower', 'higher', or 'midpoint' are encouraged to review the method they. (Deprecated NumPy 1.22)

Seems like a quick update here would get this warning to stop showing up, right? I can always ignore it, but figured I would mention it in case it is actually an error on my side.

Also, sorry, I don't actually what the best open source etiquette is. If I'm supposed to create a pull request with a proposed fix instead of just mentioning it then feel free to correct me.

opened by paul-brenner 1

How to gridsearch tree and regression parameters?

Hi, I am wondering how to perform a GridsearchCV to find best parameters for the tree and regression model? For now I am able to tune the tree component of my model:

 param_grid={
    'n_estimators': [50, 100, 500, 700],
    'max_depth': [10, 20, 30, 50],
    'min_samples_split' : [2, 4, 8, 16, 32],
    'max_features' : ['sqrt', 'log2', None]
}
cv = RepeatedKFold(n_repeats=3,
                   n_splits=3,
                   random_state=1)

model = GridSearchCV(
    LinearForestRegressor(ElasticNet(random_state = 0), random_state=42),
    param_grid=param_grid,
    n_jobs=-1,
    cv=cv,
    scoring='neg_root_mean_squared_error'
    )

opened by zuzannakarwowska 1

Potential bug in LinearForestClassifier 'predict_proba'

Hello! Thank you for useful package!

I think I might have found a potential bug in LinearForestClassifier.

I expected 'predict_proba' to use 'self.decision_function', similarly to 'predict' - to include predictions from both estimators (base + forest). Is that a potential bug or am I in wrong here?

https://github.com/cerlymarco/linear-tree/blob/8d5beca8d492cb8c57e6618e3fb770860f28b550/lineartree/lineartree.py#L1560

opened by PiotrKaszuba 1

Releases(0.3.5)

0.3.5(Aug 24, 2022)

simple code updates
Source code(tar.gz)
Source code(zip)
0.3.4(Jul 21, 2022)

Added min_impurity_decrease in LinearTreeRegressor/Classifier from here.
Source code(tar.gz)
Source code(zip)
0.3.3(May 15, 2022)

Source code(tar.gz)
Source code(zip)
0.3.2(Mar 13, 2022)

Added scikit-learn0.24.2 as min required version
Source code(tar.gz)
Source code(zip)
0.3.1(Oct 1, 2021)

Add compatibility with sklearn 1.0
Source code(tar.gz)
Source code(zip)
0.3.0(Sep 1, 2021)

Introduced Linear Forest
Source code(tar.gz)
Source code(zip)
0.2.0(Aug 27, 2021)

Introduced Linear Boosting and improved Linear Tree classes
Source code(tar.gz)
Source code(zip)
0.1.2(May 16, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Marco Cerliani

Statistician Hacker & Data Scientist

GitHub Repository

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

SLATE This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset.

66 Dec 26, 2022

YoHa - A practical hand tracking engine.

2k Jan 06, 2023

Pixel-Perfect Structure-from-Motion with Featuremetric Refinement (ICCV 2021, Oral)

Pixel-Perfect Structure-from-Motion (ICCV 2021 Oral) We introduce a framework that improves the accuracy of Structure-from-Motion by refining keypoint

831 Dec 29, 2022

A framework for the elicitation, specification, formalization and understanding of requirements.

161 Jan 03, 2023

This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

Paragraph Aggregation Retrieval Model (PARM) for Dense Document-to-Document Retrieval This repository contains the code for the paper PARM: A Paragrap

33 Aug 26, 2022

Metrics to evaluate quality and efficacy of synthetic datasets.

An Open Source Project from the Data to AI Lab, at MIT Metrics for Synthetic Data Generation Projects Website: https://sdv.dev Documentation: https://

129 Jan 03, 2023

LiDAR R-CNN: An Efficient and Universal 3D Object Detector

LiDAR R-CNN: An Efficient and Universal 3D Object Detector Introduction This is the official code of LiDAR R-CNN: An Efficient and Universal 3D Object

295 Jan 05, 2023

Select, weight and analyze complex sample data

Sample Analytics In large-scale surveys, often complex random mechanisms are used to select samples. Estimates derived from such samples must reflect

37 Dec 15, 2022

Node Editor Plug for Blender

NodeEditor Blender的程序化建模插件 Show Current 基本框架：自定义的tree-node-socket、tree中的node与socket采用字典查询、基于socket入度的拓扑排序数据传递和处理依靠Tree中的字典，socket传递字典key TODO 增加更多的节点

11 Dec 03, 2022

natural image generation using ConvNets

The Eyescream Project Generating Natural Images using Neural Networks. For our research summary on this work, please read the Arxiv paper: http://arxi

601 Nov 23, 2022

Code for the paper “The Peril of Popular Deep Learning Uncertainty Estimation Methods”

Uncertainty Estimation Methods Code for the paper “The Peril of Popular Deep Learning Uncertainty Estimation Methods” Reference If you use this code,

4 Apr 05, 2022

This is the pytorch implementation of the paper - Axiomatic Attribution for Deep Networks.

Integrated Gradients This is the pytorch implementation of "Axiomatic Attribution for Deep Networks". The original tensorflow version could be found h

150 Dec 23, 2022

SAN for Product Attributes Prediction

SAN Heterogeneous Star Graph Attention Network for Product Attributes Prediction This repository contains the official PyTorch implementation for ADVI

9 Dec 12, 2022

I have created this Virtual Paint Program, in this you can paint(draw) on your screen using hand gestures, created in Python-3 using OpenCV and Mediapipe library. Gestures :- Index Finger for drawing and Index+Middle Finger for changing position and objects.

Virtual-Paint I have created this Virtual Paint Program, in this you can paint(draw) on your screen using hand gestures, created in Python-3. Gestures

6 Sep 22, 2021

This is a simple framework to make object detection dataset very quickly

FastAnnotation Table of contents General info Requirements Setup General info This is a simple framework to make object detection dataset very quickly

1 Jan 24, 2022

[CVPR 2022] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing (CVPR 2022) This repository provides the official PyTorch impleme

128 Jan 03, 2023

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

MiVOS (CVPR 2021) - Mask Propagation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [arXiv] [Paper PDF] [Project Page] [Papers with Code] This repo impleme

106 Jan 03, 2023

a simple, efficient, and intuitive text editor

Oxygen beta a simple, efficient, and intuitive text editor Overview oxygen is a simple, efficient, and intuitive text editor designed as more featured

1 Feb 23, 2022

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

28 Oct 18, 2022

Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"

AdderNet: Do We Really Need Multiplications in Deep Learning? This code is a demo of CVPR 2020 paper AdderNet: Do We Really Need Multiplications in De

915 Jan 01, 2023

A python library to build Model Trees with Linear Models at the leaves.

Related tags

Overview

linear-tree

Overview

Installation

Media

Usage

Regression

Classification

Examples

Comments

Releases(0.3.5)

0.3.5(Aug 24, 2022)

0.3.4(Jul 21, 2022)

0.3.3(May 15, 2022)

0.3.2(Mar 13, 2022)

0.3.1(Oct 1, 2021)

0.3.0(Sep 1, 2021)

0.2.0(Aug 27, 2021)

0.1.2(May 16, 2021)

Owner

Marco Cerliani

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

YoHa - A practical hand tracking engine.

Pixel-Perfect Structure-from-Motion with Featuremetric Refinement (ICCV 2021, Oral)

A framework for the elicitation, specification, formalization and understanding of requirements.

This repository contains the code for the paper 'PARM: Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval' published at ECIR'22.

Metrics to evaluate quality and efficacy of synthetic datasets.

LiDAR R-CNN: An Efficient and Universal 3D Object Detector

Select, weight and analyze complex sample data

Node Editor Plug for Blender

natural image generation using ConvNets

Code for the paper “The Peril of Popular Deep Learning Uncertainty Estimation Methods”

This is the pytorch implementation of the paper - Axiomatic Attribution for Deep Networks.

SAN for Product Attributes Prediction

I have created this Virtual Paint Program, in this you can paint(draw) on your screen using hand gestures, created in Python-3 using OpenCV and Mediapipe library. Gestures :- Index Finger for drawing and Index+Middle Finger for changing position and objects.

This is a simple framework to make object detection dataset very quickly

[CVPR 2022] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

a simple, efficient, and intuitive text editor

Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"