distfit - Probability density fitting

Last update: Dec 30, 2022

Overview

distfit - Probability density fitting

Star it if you like it!

Background

distfit is a python package for probability density fitting across 89 univariate distributions to non-censored data by residual sum of squares (RSS), and hypothesis testing. Probability density fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon. distfit scores each of the 89 different distributions for the fit wih the empirical distribution and return the best scoring distribution.

Functionalities

The distfit library is created with classes to ensure simplicity in usage.

# Import library
from distfit import distfit

dist = distfit()        # Specify desired parameters
dist.fit_transform(X)   # Fit distributions on empirical data X
dist.predict(y)         # Predict the probability of the resonse variables
dist.plot()             # Plot the best fitted distribution (y is included if prediction is made)

Installation

Install distfit from PyPI (recommended). distfit is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.

Install from PyPi

pip install distfit

Install directly from github source (beta version)

pip install git+https://github.com/erdogant/distfit#egg=master

Install by cloning (beta version)

git clone https://github.com/erdogant/distfit.git
cd distfit
pip install -U .

Check version number

import distfit
print(distfit.__version__)

Examples

Import distfit library

from distfit import distfit

Create Some random data and model using default parameters:

import numpy as np
X = np.random.normal(0, 2, [100,10])
y = [-8,-6,0,1,2,3,4,5,6]

Specify `distfit` parameters. In this example nothing is specied and that means that all parameters are set to default.

dist = distfit(todf=True)
dist.fit_transform(X)
dist.plot()

# Prints the screen:
# [distfit] >fit..
# [distfit] >transform..
# [distfit] >[norm      ] [RSS: 0.0133619] [loc=-0.059 scale=2.031] 
# [distfit] >[expon     ] [RSS: 0.3911576] [loc=-6.213 scale=6.154] 
# [distfit] >[pareto    ] [RSS: 0.6755185] [loc=-7.965 scale=1.752] 
# [distfit] >[dweibull  ] [RSS: 0.0183543] [loc=-0.053 scale=1.726] 
# [distfit] >[t         ] [RSS: 0.0133619] [loc=-0.059 scale=2.031] 
# [distfit] >[genextreme] [RSS: 0.0115116] [loc=-0.830 scale=1.964] 
# [distfit] >[gamma     ] [RSS: 0.0111372] [loc=-19.843 scale=0.209] 
# [distfit] >[lognorm   ] [RSS: 0.0111236] [loc=-29.689 scale=29.561] 
# [distfit] >[beta      ] [RSS: 0.0113012] [loc=-12.340 scale=41.781] 
# [distfit] >[uniform   ] [RSS: 0.2481737] [loc=-6.213 scale=12.281]

Note that the best fit should be [normal], as this was also the input data. However, many other distributions can be very similar with specific loc/scale parameters. It is however not unusual to see gamma and beta distribution as these are the "barba-pappas" among the distributions. Lets print the summary of detected distributions with the Residual Sum of Squares.

# All scores of the tested distributions
print(dist.summary)

# Distribution parameters for best fit
dist.model

# Make plot
dist.plot_summary()

After we have a fitted model, we can make some predictions using the theoretical distributions. After making some predictions, we can plot again but now the predictions are automatically included.

dist.predict(y)
dist.plot()
# 
# Prints to screen:
# [distfit] >predict..
# [distfit] >Multiple test correction..[fdr_bh]

The results of the prediction are stored in y_proba and y_pred

# Show the predictions for y
print(dist.results['y_pred'])
# ['down' 'down' 'none' 'none' 'none' 'none' 'up' 'up' 'up']

# Show the probabilities for y that belong with the predictions
print(dist.results['y_proba'])
# [2.75338375e-05 2.74664877e-03 4.74739680e-01 3.28636879e-01 1.99195071e-01 1.06316132e-01 5.05914722e-02 2.18922761e-02 8.89349927e-03]
 
# All predicted information is also stored in a structured dataframe
print(dist.results['df'])
#    y   y_proba y_pred         P
# 0 -8  0.000028   down  0.000003
# 1 -6  0.002747   down  0.000610
# 2  0  0.474740   none  0.474740
# 3  1  0.328637   none  0.292122
# 4  2  0.199195   none  0.154929
# 5  3  0.106316   none  0.070877
# 6  4  0.050591     up  0.028106
# 7  5  0.021892     up  0.009730
# 8  6  0.008893     up  0.002964

Example if you want to test one specific distributions, such as the normal distribution:

The full list of distributions is listed here: https://erdogant.github.io/distfit/pages/html/Parametric.html

dist = distfit(distr='norm')
dist.fit_transform(X)

# [distfit] >fit..
# [distfit] >transform..
# [distfit] >[norm] [RSS: 0.0151267] [loc=0.103 scale=2.028]

dist.plot()

Example if you want to test multiple distributions, such as the normal and t distribution:

The full list of distributions is listed here: https://erdogant.github.io/distfit/pages/html/Parametric.html

dist = distfit(distr=['norm', 't', 'uniform'])
results = dist.fit_transform(X)

# [distfit] >fit..
# [distfit] >transform..
# [distfit] >[norm   ] [0.00 sec] [RSS: 0.0012337] [loc=0.005 scale=1.982]
# [distfit] >[t      ] [0.12 sec] [RSS: 0.0012336] [loc=0.005 scale=1.982]
# [distfit] >[uniform] [0.00 sec] [RSS: 0.2505846] [loc=-6.583 scale=15.076]
# [distfit] >Compute confidence interval [parametric]

Example to fit for discrete distribution:

from scipy.stats import binom
# Generate random numbers

# Set parameters for the test-case
n = 8
p = 0.5

# Generate 10000 samples of the distribution of (n, p)
X = binom(n, p).rvs(10000)
print(X)

# [5 1 4 5 5 6 2 4 6 5 4 4 4 7 3 4 4 2 3 3 4 4 5 1 3 2 7 4 5 2 3 4 3 3 2 3 5
#  4 6 7 6 2 4 3 3 5 3 5 3 4 4 4 7 5 4 5 3 4 3 3 4 3 3 6 3 3 5 4 4 2 3 2 5 7
#  5 4 8 3 4 3 5 4 3 5 5 2 5 6 7 4 5 5 5 4 4 3 4 5 6 2...]

# Initialize distfit for discrete distribution for which the binomial distribution is used. 
dist = distfit(method='discrete')

# Run distfit to and determine whether we can find the parameters from the data.
dist.fit_transform(X)

# [distfit] >fit..
# [distfit] >transform..
# [distfit] >Fit using binomial distribution..
# [distfit] >[binomial] [SSE: 7.79] [n: 8] [p: 0.499959] [chi^2: 1.11]
# [distfit] >Compute confidence interval [discrete]

# Get the model and best fitted parameters.
print(dist.model)

# {'distr': 
   
    ,
   
#  'params': (8, 0.4999585504197037),
#  'name': 'binom',
#  'SSE': 7.786589839641551,
#  'chi2r': 1.1123699770916502,
#  'n': 8,
#  'p': 0.4999585504197037,
#  'CII_min_alpha': 2.0,
#  'CII_max_alpha': 6.0}

# Best fitted n=8 and p=0.4999 which is great because the input was n=8 and p=0.5
dist.model['n']
dist.model['p']

# Make plot
dist.plot()

# With the fitted model we can start making predictions on new unseen data
y = [0, 1, 10, 11, 12]
results = dist.predict(y)
dist.plot()

# Make plot with the results
dist.plot()

df_results = pd.DataFrame(pd.DataFrame(results))

#   y   y_proba    y_pred   P
#   0   0.004886   down     0.003909
#   1   0.035174   down     0.035174
#   10  0.000000     up     0.000000
#   11  0.000000     up     0.000000
#   12  0.000000     up     0.000000

Example to generate samples based on the fitted distribution:

# import library
from distfit import distfit

# Generate random normal distributed data
X = np.random.normal(0, 2, 10000)
dist = distfit()

# Fit
dist.fit_transform(X)

# The fitted distribution can now be used to generate new samples.
# Generate samples
Xgenerate = dist.generate(n=1000)

Citation

Please cite distfit in your publications if this is useful for your research. See right top panel for the citation entry.


### Maintainer
	Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)
	Contributions are welcome.

Comments

Fitting distribution for discrete/categorical data

Hi

Is it possible to fit a distribution with distfit library for a discrete variable? For example, let's say I have a survey that has 10 questions with possible values that go from 1 (poor) to 5 (excellent), and 100 persons take the survey.

Best regards

opened by ogreyesp 5
Can I use the best distribution as the true distribution of my data?

Here I used distfit to get a distribution that is the closest to my data，but not exactly。When I use the kstest from the scipy library to calculate the p-value to see if I can trust the distribution, the p-value is not ideal.Can I still use distfit to get a distribution to describe my data ?

opened by yuanfuqiang456 3
in plot api, pass fig and ax to give more control to the user's code

Thanks for this great library.

Purpose of this modification: I have been using it with a multivariate time series dataset. Each dimension gets its own plot and wanted to make use of subplots to see all the dimensions at the same time (in a grid for e.g.)

Notes: a) I have added fig as the parameter to the plotting API as well. Generally, it is not required. I have done it so as to not create a situation where the number of return values is 1. This way your function always return 2 values (the tuple).

b) Instead of using plt.xlim and plt.ylim, I am using ax.set_xlim & ax.set_ylim. This should work for previous version and for this modification as well.

c) For now if the method is 'discrete' then passed fig and axes are ignored since the plot_binom function creates subplots internally.

opened by ksachdeva 3
Add loggamma

I have a problem where loggamma fits best. I ran your script and my own custom script, they agree on beta parameters but the loggamma seemed much more natural. If it's not too much trouble, please consider adding this. If you are using scipy.stats, then it's the same API as others.

Cool project.

opened by tirthajyoti 3
Two questions about distfit
This project looks really great, thank you. I have two questions:

How do you set loc = 0 if you know that is the right value for it? I am trying to fit to a symmetric distribution.

When I try distfit with distr='full' it gets stuck at levy_l. Is this expected?
opened by lesshaste 3
Plots are not generated

Hi,

Both dist.plot() and dist.plot_summary() do not generate plots for me. I am using the bare version of Python (i.e no Conda etc.)

Am I missing somethings?

Regards,

Danish

opened by danishTUE 2
T Distribution Weirdness

We are using distfit to try to determine if some data we have can be modelled parametrically. For some of the data, the best fitting distribution was a t. Scale and loc are clearly documented, and that is great. There is one remaining parameter to fit a t distribution, and that is degrees of freedom. Except, the one parameter in the distfit output that isn't a scale or loc value is less than one. Obviously, degrees of freedom can't be less than one. So what is that parameter and why isn't degrees of freedom included in the output? It would be helpful for automating our process.

opened by angelgeek 2
Save best parameters

Hello, your package really useful, thanks a lot!

I have a question: If I want to print the best parameters, what's the syntax? For example, I want to print the best n and p for binomial distribution for the following work.

thanks a lot

opened by hummm310 2
Remove plt.show() calls
Thank you for your time spent making this package.

When you call plt.show(), you've rendered the plot and it can no longer be modified by the user, making it pointless to return the figure and axes objects.

For example, try:

fig, ax = dist.plot() ax.axvline(x=0) plt.title("Blarg!")

Unlike sns plots and dataframe.plot() calls that many are familiar with, the plots of distfit cannot be modified after called. This is surprising to the user (at least it was to me 😀)
opened by isosphere 2
The `distr` parameter should accept a list

The distr parameter in your core distfit class should accept a custom list of distributions that the user wants to run fitting on. Is there a specific reason you have not allowed it to accept a list?

opened by tirthajyoti 2
`generate` or `rvs` method?
Do you plan to have a generate or rvs method added to a fitted dist class to generate a given number (chosen by a size parameter) of new points with the best-fitted distribution? Here is the imagined code (say I have a dataset called dataset)

dist = distfit(todf=True) dist.fit_transform(dataset) # Newly generated 1000 points from the best-fitted distribution (based on some score criteria) new_data = dist.generate(size=1000)
opened by tirthajyoti 2
Robustness of selected data models

Good day!

Guys, I have found your package really cool) Thanks a lot)

I have a question:

Our incoming data can be with anomalies, noise. So, quality of our results is vulnerable to strong/weak outliers. Work with outliers is key feature of your package. Consequently, the quality of predictions based on our data model can be severely compromised. In a sense, we are training and predicting from the same data.

What is your advice?

I understand that, it is largely dependent on and provided by the nature of one or another theoretical distribution of data.

But, better to know, your personal opinion as authors...

opened by datason 1
Add K distribution

What a really awesome repository !

By the way, K distribution is widely used in the filed of Radar and sonar. It is necessary to estimate the parameters of the K distribution.

Please consider adding this distribution if possible.

opened by ShaofengZou 3
KS-test in fitdist

Hello everyone,

I noticed in the code erdogant/distfit/distfit.py that whenever you use the KS statistical test (stats=ks), you call the scipy.stats.ks_2samp to test your data against the distribution you estimated through MLE (maximum likelihood estimation). Is that true? If so, this is wrong, because now the KS statistic depends on your data and the test is no longer valid. In such a case, I would recommend you to have a look at parametric/non-parametric bootstrapping to solve the issue. This reference could be useful https://ui.adsabs.harvard.edu/abs/2006ASPC..351..127B/abstract

opened by marcellobullo 10

Releases(1.4.5)

1.4.5(May 10, 2022)
Some code refactoring and cleaning

Added test statistic name in the title of the figure

Source code(tar.gz)
Source code(zip)
1.4.4(Mar 19, 2022)
Changed title text of plot with scientific notation.

Source code(tar.gz)
Source code(zip)

1.4.3(Mar 19, 2022)

alpha parameter added to the predict function.
Output contains y_boolwhich is y_proba<=alpha

from distfit import distfit
X = np.random.normal(0, 2, 1000)
y = [-8, -6, 0, 1, 2, 3, 4, 5, 6]

dist = distfit()
dist.fit_transform(X)
results = dist.predict(y, alpha=0.01)
results['y_bool']

Source code(tar.gz)
Source code(zip)

1.4.2(Nov 30, 2021)
added doi

Source code(tar.gz)
Source code(zip)
1.4.1(Oct 1, 2021)
Pass fig and ax to give more control to the user's for plotting.

Thank you for the contribution @ksachdeva!
Source code(tar.gz)
Source code(zip)
1.4.0(Mar 26, 2021)
New function "generate" that allows to generate samples after fitting on the data.

Discrete output parameters aligned with output parameters of parametric models

New output variable added: "model" which is the fitted model based on loc/scale params. The "distr" remains the unfitted model.

Code generalized which allows that discrete and parametric runs in more same functions.

Different scoring statistics is now also possible for discrete fitting.

Source code(tar.gz)
Source code(zip)
1.3.0(Mar 25, 2021)
output parameter in dict "RSS" changed into "score" because various scoring statistics can be chosen.

Source code(tar.gz)
Source code(zip)
1.2.8(Mar 24, 2021)
Added possibility to use different scoring statistics. The parameter "stats" is to be used to define the scoring statistic: RSS, wasserstein, Kolmogorov-Smirnov statistic (ks) or energy

Source code(tar.gz)
Source code(zip)
1.2.7(Mar 21, 2021)
Fitting of discrete non-negative integer data is possible now using the binomial distribution!

Updated sphinx pages

Updated readme

Updated notebook

from distfit import distfit dist = distfit(method='discrete') dist.fit_transform(X)
Source code(tar.gz)
Source code(zip)
1.2.6(Feb 7, 2021)
Improved speed

Iutputs time duration for fitting distribution

Dataframe is not a default output anymore. Nevertheless, it can be returned using to todf=True setting during initialization

Removed unsupported distributions: frechet_r and frechet_l

Update docs and readme

Smoothline function integrated instead of seperate file.

Source code(tar.gz)
Source code(zip)
1.1.6(Oct 17, 2020)
pypickle used

pep styling

tested for python 3.8

Source code(tar.gz)
Source code(zip)
1.1.5(Jul 14, 2020)
removed levy_l and stable from full list of distributions because it is too slow.

Source code(tar.gz)
Source code(zip)
1.1.4(Jun 26, 2020)
quantile added as new method

percentile added as new method

Source code(tar.gz)
Source code(zip)
1.1.3(Jun 18, 2020)
typo fix in naming

some fixes in examples.py

scipy library add to setup

docstring updates

Source code(tar.gz)
Source code(zip)
1.1.2(May 12, 2020)
predict returns a dict

fit_transforms returns a dict

Source code(tar.gz)
Source code(zip)
1.1.1(Apr 28, 2020)
save and loading

sphinx updates

docstring updates

smoothing

Source code(tar.gz)
Source code(zip)
1.1.0(Apr 27, 2020)
Fast update after previous version where the following is done:

input parameters changed: distribution into distr.

various smaller changes

Source code(tar.gz)
Source code(zip)
1.0.0(Apr 27, 2020)
Huge update to version 1.0.0! disfit is becoming more easy as the whole code is rewritten in classes now!

distfit code refactored with classes

examples updated

unit test updated

docstrings updated

Source code(tar.gz)
Source code(zip)
0.1.6(Apr 7, 2020)
seems that the previous release was not correctly released.

Source code(tar.gz)
Source code(zip)
0.1.5(Feb 7, 2020)
bug fix in proba_parameteric.

Source code(tar.gz)
Source code(zip)
0.1.4(Feb 7, 2020)
Bug fix in finding best distribution!

Bug fix in plot

New unit test to avoid this in the future.

Source code(tar.gz)
Source code(zip)
0.1.3(Feb 6, 2020)
unit tests

summary plot

refactoring

code cleaning

plot improvements

Source code(tar.gz)
Source code(zip)
v0.1.2(Jan 29, 2020)

Source code(tar.gz)
Source code(zip)
0.1.1(Jan 24, 2020)
Code refactoring

Code cleaning

plot possible for .proba_emperical() with .plot()

Source code(tar.gz)
Source code(zip)
0.1.0(Jan 5, 2020)

Source code(tar.gz)
Source code(zip)

Owner

Erdogan Taskesen

GitHub Repository https://erdogant.github.io/distfit

Forecast dynamically at scale with this unique package. pip install scalecast

🌄 Scalecast: Dynamic Forecasting at Scale About This package uses a scaleable forecasting approach in Python with common scikit-learn and statsmodels

158 Jan 03, 2023

NumPy-based implementation of a multilayer perceptron (MLP)

My own NumPy-based implementation of a multilayer perceptron (MLP). Several of its components can be tuned and played with, such as layer depth and size, hidden and output layer activation functions,

1 Feb 10, 2022

LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate.

LibRerank LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRer

126 Dec 28, 2022

List of Data Science Cheatsheets to rule the world

Data Science Cheatsheets List of Data Science Cheatsheets to rule the world. Table of Contents Business Science Business Science Problem Framework Dat

11.7k Dec 30, 2022

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

What is xLearn? xLearn is a high performance, easy-to-use, and scalable machine learning package that contains linear model (LR), factorization machin

3k Jan 08, 2023

A Python package for time series classification

pyts: a Python package for time series classification pyts is a Python package for time series classification. It aims to make time series classificat

1.4k Jan 01, 2023

neurodsp is a collection of approaches for applying digital signal processing to neural time series

neurodsp is a collection of approaches for applying digital signal processing to neural time series, including algorithms that have been proposed for the analysis of neural time series. It also inclu

224 Dec 02, 2022

Microsoft Machine Learning for Apache Spark

Microsoft Machine Learning for Apache Spark MMLSpark is an ecosystem of tools aimed towards expanding the distributed computing framework Apache Spark

3.9k Dec 30, 2022

Open MLOps - A Production-focused Open-Source Machine Learning Framework

Open MLOps - A Production-focused Open-Source Machine Learning Framework Open MLOps is a set of open-source tools carefully chosen to ease user experi

590 Dec 28, 2022

Python package for causal inference using Bayesian structural time-series models.

Python Causal Impact Causal inference using Bayesian structural time-series models. This package aims at defining a python equivalent of the R CausalI

219 Dec 11, 2022

[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

TensorFrames (Deprecated) Note: TensorFrames is deprecated. You can use pandas UDF instead. Experimental TensorFlow binding for Scala and Apache Spark

757 Dec 31, 2022

Price forecasting of SGB and IRFC Bonds and comparing there returns

Project_Bonds Project Title : Price forecasting of SGB and IRFC Bonds and comparing there returns. Introduction of the Project The 2008-09 global fina

1 Oct 28, 2021

虚拟货币(BTC、ETH)炒币量化系统项目。在一版本的基础上加入了趋势判断

🎉 第二版本 🎉 （现货趋势网格）介绍在第一版本的基础上趋势判断，不在固定点位开单，选择更优的开仓点位优势： 🎉 简单易上手安全(不用将api_secret告诉他人) 如何启动修改app目录下的authorization文件

250 Jan 07, 2023

Pandas Machine Learning and Quant Finance Library Collection

148 Dec 07, 2022

Book Item Based Collaborative Filtering

Book-Item-Based-Collaborative-Filtering Collaborative filtering methods are used

3 Jan 06, 2022

Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

implementation of machine learning Algorithms such as decision tree and random forest and xgboost on darasets then compare results for each and implement ant colony and genetic algorithms on tsp map,

1 Jan 19, 2022

A simple python program that draws a tree for incrementing values using the Collatz Conjecture.

Collatz Conjecture A simple python program that draws a tree for incrementing values using the Collatz Conjecture. Values which can be edited: Length

1 Oct 28, 2021

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

1 Jan 11, 2022

Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters

Somoclu Somoclu is a massively parallel implementation of self-organizing maps. It exploits multicore CPUs, it is able to rely on MPI for distributing

239 Nov 10, 2022

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Machine Learning Notebooks, 3rd edition This project aims at teaching you the fundamentals of Machine Learning in python. It contains the example code

1.6k Jan 05, 2023

distfit - Probability density fitting

Related tags

Overview

distfit - Probability density fitting

Background

Functionalities

Installation

Install from PyPi

Install directly from github source (beta version)

Install by cloning (beta version)

Check version number

Examples

Create Some random data and model using default parameters:

Specify distfit parameters. In this example nothing is specied and that means that all parameters are set to default.

Example if you want to test one specific distributions, such as the normal distribution:

Example if you want to test multiple distributions, such as the normal and t distribution:

Example to fit for discrete distribution:

Example to generate samples based on the fitted distribution:

Citation

Comments

Releases(1.4.5)

1.4.5(May 10, 2022)

1.4.4(Mar 19, 2022)

1.4.3(Mar 19, 2022)

1.4.2(Nov 30, 2021)

1.4.1(Oct 1, 2021)

1.4.0(Mar 26, 2021)

1.3.0(Mar 25, 2021)

1.2.8(Mar 24, 2021)

1.2.7(Mar 21, 2021)

1.2.6(Feb 7, 2021)

1.1.6(Oct 17, 2020)

1.1.5(Jul 14, 2020)

1.1.4(Jun 26, 2020)

1.1.3(Jun 18, 2020)

1.1.2(May 12, 2020)

1.1.1(Apr 28, 2020)

1.1.0(Apr 27, 2020)

1.0.0(Apr 27, 2020)

0.1.6(Apr 7, 2020)

0.1.5(Feb 7, 2020)

0.1.4(Feb 7, 2020)

0.1.3(Feb 6, 2020)

v0.1.2(Jan 29, 2020)

0.1.1(Jan 24, 2020)

0.1.0(Jan 5, 2020)

Owner

Erdogan Taskesen

Forecast dynamically at scale with this unique package. pip install scalecast

NumPy-based implementation of a multilayer perceptron (MLP)

LibRerank is a toolkit for re-ranking algorithms. There are a number of re-ranking algorithms, such as PRM, DLCM, GSF, miDNN, SetRank, EGRerank, Seq2Slate.

List of Data Science Cheatsheets to rule the world

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

A Python package for time series classification

neurodsp is a collection of approaches for applying digital signal processing to neural time series

Microsoft Machine Learning for Apache Spark

Open MLOps - A Production-focused Open-Source Machine Learning Framework

Python package for causal inference using Bayesian structural time-series models.

[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

Price forecasting of SGB and IRFC Bonds and comparing there returns

虚拟货币(BTC、ETH)炒币量化系统项目。在一版本的基础上加入了趋势判断

Pandas Machine Learning and Quant Finance Library Collection

Book Item Based Collaborative Filtering

Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

A simple python program that draws a tree for incrementing values using the Collatz Conjecture.

To design and implement the Identification of Iris Flower species using machine learning using Python and the tool Scikit-Learn.

Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Specify `distfit` parameters. In this example nothing is specied and that means that all parameters are set to default.