Python based GBDT implementation

Last update: Sep 21, 2022

Related tags

Machine Learning Py-Boost

Overview

Py-boost: a research tool for exploring GBDTs

Modern gradient boosting toolkits are very complex and are written in low-level programming languages. As a result,

It is hard to customize them to suit one’s needs
New ideas and methods are not easy to implement
It is difficult to understand how they work

Py-boost is a Python-based gradient boosting library which aims at overcoming the aforementioned problems.

Authors: Anton Vakhrushev, Leonid Iosipoi.

Py-boost Key Features

Simple. Py-boost is a simplified gradient boosting library but it supports all main features and hyperparameters available in other implementations.

Fast with GPU. Despite the fact that Py-boost is written in Python, it works only on GPU and uses Python GPU libraries such as CuPy and Numba.

Easy to customize. Py-boost can be easily customized even if one is not familiar with GPU programming (just replace np with cp). What can be customized? Almost everuthing via custom callbacks. Examples: Row/Col sampling strategy, Training control, Losses/metrics, Multioutput handling strategy, Anything via custom callbacks

Installation

Before installing py-boost via pip you should have cupy installed. You can use:

pip install -U cupy-cuda110 py-boost

Note: replace with your cuda version! For the details see this guide

Quick tour

Py-boost is easy to use since it has similar to scikit-learn interface. For usage example please see:

Tutorial_1_Basics for simple usage examples
Tutorial_2_Advanced_multioutput for advanced multioutput features
Tutorial_3_Custom_features for examples of customization

More examples are comming soon

Other Sber AI Lab Projects

LightAutoML: https://github.com/sberbank-ai-lab/LightAutoML
AutoWoE: https://github.com/sberbank-ai-lab/AutoMLWhitebox
RePlay: https://github.com/sberbank-ai-lab/RePlay

Python based GBDT implementation

Related tags

Overview

Py-boost: a research tool for exploring GBDTs

Py-boost Key Features

Installation

Quick tour

Other Sber AI Lab Projects

Owner

Sberbank AI Lab

Distributed scikit-learn meta-estimators in PySpark

A python library for easy manipulation and forecasting of time series.

ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions

Nevergrad - A gradient-free optimization platform

A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

Python module for data science and machine learning users.

Steganography is the art of hiding the fact that communication is taking place, by hiding information in other information.

A Time Series Library for Apache Spark

easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

A GitHub action that suggests type annotations for Python using machine learning.

SIMD-accelerated bitwise hamming distance Python module for hexidecimal strings

Painless Machine Learning for python based on scikit-learn

ML Optimizers from scratch using JAX

Simple structured learning framework for python

A collection of Scikit-Learn compatible time series transformers and tools.

ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions

Class-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible

A modular active learning framework for Python

A toolkit for making real world machine learning and data analysis applications in C++

A simple machine learning package to cluster keywords in higher-level groups.