Machine learning model evaluation made easy: plots, tables, HTML reports, experiment tracking and Jupyter notebook analysis.

Overview
Comments
  • Quickstart clustering

    Quickstart clustering

    Adds quick start for clustering. Note that I had to make some changes to the tests and the elbow curve implementation since I found minor issues: hardcoded figure size, missing n_clusters in the title and hardcoded random seed.

    opened by edublancas 10
  • new ROC api added to plot

    new ROC api added to plot

    Describe your changes

    • New ROC API (inherits from Plot)
    • plot.ROC.__add__ added for generating overlapping curves
    • The old roc API is still supported

    Issue ticket number and link

    Closes #84

    Checklist before requesting a review

    • [x] I have performed a self-review of my code
    • [x] I have added thorough tests (when necessary).
    • [x] I have added the right documentation (when needed). Product update? If yes, write one line about this update.
    opened by yafimvo 8
  • minor changes to silhouette_plot

    minor changes to silhouette_plot

    I was going to release a new version with the silhouette_plot @neelasha23 but noticed a few things.

    Our convention is not to include the word plot in the function names (since they're all in the plot, module, can you rename them?

    silhouette_plot -> silhouette silhouette_plot_from_results -> silhouette_from_results

    Also, please include 0.8.3 as the version when this plots became available, in case anyone is using an older version. This way they'll know they have to update, you can add a .. versionadded:: in a Notes section in the plot's docstring

    https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#directive-versionadded

    FYI: @idomic

    opened by edublancas 7
  • Inconsistency in image comparison

    Inconsistency in image comparison

    The results of matplotlib's @image_comparison are a bit inconsistent sometimes (behaving differently in local vs CI). Maybe we can aim to build a custom utility for comparing images from plots.

    opened by neelasha23 7
  • Bug: Missing colab flag

    Bug: Missing colab flag

    on some of the stats calls, the colab flag is missing. This makes it difficult to understand how many of the users are actually in colab or just plain docker.

    opened by idomic 6
  • docs broken

    docs broken

    looks like @neelasha23's last PR broke the documentation because of a change in sklearn:

    PapermillExecutionError: 
    ---------------------------------------------------------------------------
    Exception encountered at "In [1]":
    ---------------------------------------------------------------------------
    ImportError                               Traceback (most recent call last)
    Cell In[1], line 3
          1 import importlib
    ----> 3 from sklearn.datasets import load_boston
          4 from sklearn.model_selection import train_test_split
          5 from sklearn import metrics
    
    File ~/checkouts/readthedocs.org/user_builds/sklearn-evaluation/conda/latest/lib/python3.8/site-packages/sklearn/datasets/__init__.py:156, in __getattr__(name)
        105 if name == "load_boston":
        106     msg = textwrap.dedent(
        107         """
        108         `load_boston` has been removed from scikit-learn since version 1.2.
       (...)
        154         """
        155     )
    --> 156     raise ImportError(msg)
        157 try:
        158     return globals()[name]
    
    ImportError: 
    `load_boston` has been removed from scikit-learn since version 1.2.
    
    

    FYI @idomic

    opened by edublancas 5
  • Installing sklearn_evaluation

    Installing sklearn_evaluation

    I used "pip install sklearn-evaluation" to install this library in anaconda. All requirements exit but it does not install. When I want to import it, there is no library. When I run pip command to install it, it does not access to install, nor install anything.

    opened by AminShah69 5
  • Incompatibility with sklearn 0.20.0

    Incompatibility with sklearn 0.20.0

    Hi. I was trying to use this package with the up-to-dated version of scikit (0.20.0) but I did not understand how to do it. In particular, I was trying to use

    from sklearn_evaluation import plot plot.grid_search(gridCV.grid_scores_, change=change,kind='bar')

    but the member grid_scores_ does not exist any more (present till scikit 0.17) and has been substituted by cv_results_, which returns an object of different data type with respect to the former member. Is there an easy way to go on using this function by using the new cv_results_ in place of grid_scores_? Thank you.

    opened by mfaggin 5
  • refactor plots for better integration with tracker

    refactor plots for better integration with tracker

    In sklearn-evaluation 0.8.2, I introduced two new methods to the SQL experiment tracker: log_confusion_matrix and log_classification_report. These two methods allow users to store plots in the SQLite database and retrieve them later.

    However, unlike previous versions, we're not storing the actual plot in the database, but the statistics we need to re-create the plot. For example, to re-create a confusion matrix, we can store the numbers on each quadrant. The benefit of this approach is that we can serialize and unserialize the plots as objects and allow the user to combine them for better comparison. See this example.

    Enabling this involves several changes in the plotting code since we need to split the part that computes the statistics to display from the code that generates the plot, and this has to be performed for each plot (so far, only confusion matrix and classification report have been refactored)

    The purpose of this issue is to start refactoring other popular plots. We still need to support the old API (e.g., plot.confusion_matrix), but it should use the object-oriented API under the hood (e.g., plot.ConfusionMatrix)

    The next one we can implement is the ROC curve. All classes should behave similarly; here are some pointers:

    • the class constructor should take the data needed to generate the plot (fpr and tpr as returned by roc_curve)
    • No need to implement __sub__ - not applicable for ROC. just raise a NotImplementedError with an appropriate error message
    • __add__ should create a new plot with overlapping ROC curves. This translates into users being able to do roc1 + roc2 to generated the overlapping plot
    • the _get_data method should return the data needed to re-create the plot (example)
    • the from_dump class method should re-create a plot from a dumped json file (note that the dump method is implemented in the parent class
    opened by edublancas 4
  • SKLearnEvaluationLogger added

    SKLearnEvaluationLogger added

    Describe your changes

    SKLearnEvaluationLogger decorator wraps telemetry log_api functionality and allows to generate logs for sklearn-evaluation as follows:

    @SKLearnEvaluationLogger.log(feature='plot')
    def confusion_matrix(
            y_true,
            y_pred,
            target_names=None,
            normalize=False,
            cmap=None,
            ax=None,
            **kwargs):
    pass
    

    this will generate the following log:

            {
              "metadata": {
              "action": "confusion_matrix"
              "feature": "plot",
              "args": {
                            "target_names": "None",
                            "normalize": "False",
                            "cmap": "None",
                            "ax": "None"
                        }
              }
            }
    

    ** since y_true and y_pred are positional arguments without default values it won't log them

    we can also use pre-defined flags when calling a function

            return plot.confusion_matrix(self.y_true, self.y_pred, self.target_names, ax=_gen_ax())
    

    which will generate the following log:

            "metadata": {
                "action": "confusion_matrix"
                "feature": "plot",
                "args": {
                    "target_names": "['setosa', 'versicolor', 'virginica']",
                    "normalize": "False",
                    "cmap": "None",
                    "ax": "AxesSubplot(0.125,0.11;0.775x0.77)"
                }
            },
    

    Queries

    Run queries and filter out sklearn-evaluation events by the event name: sklearn-evaluation Break these events by feature ('plot', 'report', 'SQLiteTracker', 'NotebookCollection') Break events by actions (i.e: 'confusion_matrix', 'roc', etc...) and/or flags ('is_report')

    Errors

    Failing runnings will be named: sklearn-evaluation-error

    Checklist before requesting a review

    • [X] I have performed a self-review of my code
    • [X] I have added thorough tests (when necessary).
    • [] I have added the right documentation (when needed). Product update? If yes, write one line about this update.
    opened by yafimvo 4
  • GridSearch heatmap for 'None' parameter

    GridSearch heatmap for 'None' parameter

    When I try to generate a heatmap for GridSearchCV results, if the parameter has 'None' type, it gives error: TypeError: '<' not supported between instances of 'NoneType' and 'int'

    The parameter can be, for e.g. max_depth_for_decision_trees = [3, 5, 10, None].

    Is there any workaround for this?

    opened by shrsulav 4
  • doc intro is empty

    doc intro is empty

    our intro page is empty: https://sklearn-evaluation.ploomber.io/en/latest/intro.html

    we should briefly describe the features in the library (possibly with some short examples) and add links to our quick starts

    opened by edublancas 1
  • ConfusionMatrix fix.

    ConfusionMatrix fix.

    Adresses #145

    Restructured ConfusionMatrix class to include a plot method that plots data and axes to a matplotlib figure and returns a ConfusionMatrix class object. An object is returned so as to not break the addition and subtraction functions in the class. The figure is a matplotlib object and can be resized using matplotlib methods. The figure is accessed by the figure attribute of the class instance.

    Example:

    tree_cm = plot.ConfusionMatrix.from_raw_data(y_test, tree_pred, normalize=False) # Creates a ConfusionMatrix class instance tree_cm.figure.set_size_inches(5,5) # Resizes the figure to 5 by 5 inches tree_cm.figure # Outputs the figure contained in class instance

    opened by digithed 1
  • documenting alternatives to elbow curve

    documenting alternatives to elbow curve

    I came across this paper, which suggests that the elbow method isn't the best for choosing the number of clusters. We should give it a read, look for other sources and incorporate some of this advice in our elbow curve documentation. We could implement the alternatives.

    opened by edublancas 0
  • Prediction error plot - issue in logic

    Prediction error plot - issue in logic

    The prediction error piece has this logic: model.fit(y_reshaped, y_pred). This looks incorrect. It's trying to fit 2 sets of y values whereas it should fit (X,y). Need to understand why this statement is here and rectify accordingly.

    opened by neelasha23 0
Releases(0.5.6)
Owner
Eduardo Blancas
Developing tools for reproducible Data Science.
Eduardo Blancas
Databricks Certified Associate Spark Developer preparation toolkit to setup single node Standalone Spark Cluster along with material in the form of Jupyter Notebooks.

Databricks Certification Spark Databricks Certified Associate Spark Developer preparation toolkit to setup single node Standalone Spark Cluster along

19 Dec 13, 2022
Random Forest Classification for Neural Subtypes

Random Forest classifier for neural subtypes extracted from extracellular recordings from human brain organoids.

Michael Zabolocki 1 Jan 31, 2022
MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees.

MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees. MooGBT optimizes for multiple objectives by defining constraints on sub-objective(s) along with a primary objective. Th

Swiggy 66 Dec 06, 2022
A Python toolbox to churn out organic alkalinity calculations with minimal brain engagement.

Organic Alkalinity Sausage Machine A Python toolbox to churn out organic alkalinity calculations with minimal brain engagement. Getting started To mak

Charles Turner 1 Feb 01, 2022
Ml based project which uses regression technique to predict the price.

Price-Predictor Ml based project which uses regression technique to predict the price. I have used various regression models and finds the model with

Garvit Verma 1 Jul 09, 2022
The project's goal is to show a real world application of image segmentation using k means algorithm

The project's goal is to show a real world application of image segmentation using k means algorithm

2 Jan 22, 2022
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Jan 05, 2023
🔬 A curated list of awesome machine learning strategies & tools in financial market.

🔬 A curated list of awesome machine learning strategies & tools in financial market.

GeorgeZou 1.6k Dec 30, 2022
As we all know the BGMI Loot Crate comes with so many resources for the gamers, this ML Crate will be the hub of various ML projects which will be the resources for the ML enthusiasts! Open Source Program: SWOC 2021 and JWOC 2022.

Machine Learning Loot Crate 💻 🧰 🔴 Welcome contributors! As we all know the BGMI Loot Crate comes with so many resources for the gamers, this ML Cra

Abhishek Sharma 89 Dec 28, 2022
This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

B DEVA DEEKSHITH 1 Nov 03, 2021
Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.

Time series analysis today is an important cornerstone of quantitative science in many disciplines, including natural and life sciences as well as eco

Christoph Mark 129 Dec 24, 2022
Official code for HH-VAEM

HH-VAEM This repository contains the official Pytorch implementation of the Hierarchical Hamiltonian VAE for Mixed-type Data (HH-VAEM) model and the s

Ignacio Peis 8 Nov 30, 2022
A single Python file with some tools for visualizing machine learning in the terminal.

Machine Learning Visualization Tools A single Python file with some tools for visualizing machine learning in the terminal. This demo is composed of t

Bram Wasti 35 Dec 29, 2022
Tools for mathematical optimization region

Tools for mathematical optimization region

林景 15 Nov 30, 2022
Predict the demand for electricity (R) - FRENCH

06.demand-electricity Predict the demand for electricity (R) - FRENCH Prédisez la demande en électricité Prérequis Pour effectuer ce projet, vous devr

1 Feb 13, 2022
Price forecasting of SGB and IRFC Bonds and comparing there returns

Project_Bonds Project Title : Price forecasting of SGB and IRFC Bonds and comparing there returns. Introduction of the Project The 2008-09 global fina

Tishya S 1 Oct 28, 2021
BigDL: Distributed Deep Learning Framework for Apache Spark

BigDL: Distributed Deep Learning on Apache Spark What is BigDL? BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can w

4.1k Jan 09, 2023
Examples and code for the Practical Machine Learning workshop series

Practical Machine Learning Workshop Series Practical Machine Learning for Quantitative Finance Post conference workshop at the WBS Spring Conference D

CompatibL 21 Jun 25, 2022
🤖 ⚡ scikit-learn tips

🤖 ⚡ scikit-learn tips New tips are posted on LinkedIn, Twitter, and Facebook. 👉 Sign up to receive 2 video tips by email every week! 👈 List of all

Kevin Markham 1.6k Jan 03, 2023
Python 3.6+ toolbox for submitting jobs to Slurm

Submit it! What is submitit? Submitit is a lightweight tool for submitting Python functions for computation within a Slurm cluster. It basically wraps

Facebook Incubator 768 Jan 03, 2023