An intuitive library to add plotting functionality to scikit-learn objects.

Overview

Welcome to Scikit-plot

PyPI version license Build Status PyPI DOI

Single line functions for detailed visualizations

The quickest and easiest way to go from analysis...

roc_curves

...to this.

Scikit-plot is the result of an unartistic data scientist's dreadful realization that visualization is one of the most crucial components in the data science process, not just a mere afterthought.

Gaining insights is simply a lot easier when you're looking at a colored heatmap of a confusion matrix complete with class labels rather than a single-line dump of numbers enclosed in brackets. Besides, if you ever need to present your results to someone (virtually any time anybody hires you to do data science), you show them visualizations, not a bunch of numbers in Excel.

That said, there are a number of visualizations that frequently pop up in machine learning. Scikit-plot is a humble attempt to provide aesthetically-challenged programmers (such as myself) the opportunity to generate quick and beautiful graphs and plots with as little boilerplate as possible.

Okay then, prove it. Show us an example.

Say we use Naive Bayes in multi-class classification and decide we want to visualize the results of a common classification metric, the Area under the Receiver Operating Characteristic curve. Since the ROC is only valid in binary classification, we want to show the respective ROC of each class if it were the positive class. As an added bonus, let's show the micro-averaged and macro-averaged curve in the plot as well.

Let's use scikit-plot with the sample digits dataset from scikit-learn.

# The usual train-test split mumbo-jumbo
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
nb = GaussianNB()
nb.fit(X_train, y_train)
predicted_probas = nb.predict_proba(X_test)

# The magic happens here
import matplotlib.pyplot as plt
import scikitplot as skplt
skplt.metrics.plot_roc(y_test, predicted_probas)
plt.show()

roc_curves

Pretty.

And... That's it. Encaptured in that small example is the entire philosophy of Scikit-plot: single line functions for detailed visualization. You simply browse the plots available in the documentation, and call the function with the necessary arguments. Scikit-plot tries to stay out of your way as much as possible. No unnecessary bells and whistles. And when you do need the bells and whistles, each function offers a myriad of parameters for customizing various elements in your plots.

Finally, compare and view the non-scikit-plot way of plotting the multi-class ROC curve. Which one would you rather do?

Maximum flexibility. Compatibility with non-scikit-learn objects.

Although Scikit-plot is loosely based around the scikit-learn interface, you don't actually need Scikit-learn objects to use the available functions. As long as you provide the functions what they're asking for, they'll happily draw the plots for you.

Here's a quick example to generate the precision-recall curves of a Keras classifier on a sample dataset.

# Import what's needed for the Functions API
import matplotlib.pyplot as plt
import scikitplot as skplt

# This is a Keras classifier. We'll generate probabilities on the test set.
keras_clf.fit(X_train, y_train, batch_size=64, nb_epoch=10, verbose=2)
probas = keras_clf.predict_proba(X_test, batch_size=64)

# Now plot.
skplt.metrics.plot_precision_recall_curve(y_test, probas)
plt.show()

p_r_curves

You can see clearly here that skplt.metrics.plot_precision_recall_curve needs only the ground truth y-values and the predicted probabilities to generate the plot. This lets you use anything you want as the classifier, from Keras NNs to NLTK Naive Bayes to that groundbreaking classifier algorithm you just wrote.

The possibilities are endless.

Installation

Installation is simple! First, make sure you have the dependencies Scikit-learn and Matplotlib installed.

Then just run:

pip install scikit-plot

Or if you want the latest development version, clone this repo and run

python setup.py install

at the root folder.

If using conda, you can install Scikit-plot by running:

conda install -c conda-forge scikit-plot

Documentation and Examples

Explore the full features of Scikit-plot.

You can find detailed documentation here.

Examples are found in the examples folder of this repo.

Contributing to Scikit-plot

Reporting a bug? Suggesting a feature? Want to add your own plot to the library? Visit our contributor guidelines.

Citing Scikit-plot

Are you using Scikit-plot in an academic paper? You should be! Reviewers love eye candy.

If so, please consider citing Scikit-plot with DOI DOI

APA

Reiichiro Nakano. (2018). reiinakano/scikit-plot: 0.3.7 [Data set]. Zenodo. http://doi.org/10.5281/zenodo.293191

IEEE

[1]Reiichiro Nakano, “reiinakano/scikit-plot: 0.3.7”. Zenodo, 19-Feb-2017.

ACM

[1]Reiichiro Nakano 2018. reiinakano/scikit-plot: 0.3.7. Zenodo.

Happy plotting!

Comments
  • Improve handling of unbalanced confusion matrices

    Improve handling of unbalanced confusion matrices

    Here I have made a few changes that make it easier to plot confusion matrices where the true and predicted sets of labels are not the same. This is a case that can occur when doing something like applying "new" categories to a dataset with an older set of categories.

    The changes included are the following:

    Fix an issue with nan values showing up when unbalanced confusion matrices are normalized. Where rows with zero entries would sum to zero and then divide by zero when normalizing each cell.

    Add options to limit the labels displayed on the true and predicted axes, as with unbalanced confusion matrices some of the labels can be only in the set of true labels or only in the set of predicted labels.

    You can see the effect of the new options here:

    import numpy as np
    import matplotlib.pyplot as plt
    import scikitplot as sciplt
    
    y_true = np.array(["A", "A", "B", "B", "B", "C", "D"])
    y_pred = np.array(["A", "A", "Ba", "Bb", "Ba", "C", "D"])
    
    print(y_true.shape)
    print(y_pred.shape)
    
    true_labels = np.unique(y_true)
    pred_labels = np.unique(y_pred)
    
    labels = np.sort(np.unique(np.concatenate([true_labels, pred_labels])))
    
    true_label_indexes = np.where(np.isin(labels, true_labels))
    pred_label_indexes = np.where(np.isin(labels, pred_labels))
    
    sciplt.plotters.plot_confusion_matrix(y_true, y_pred, hide_zeros=True, normalize=True, true_label_indexes=true_label_indexes, pred_label_indexes=pred_label_indexes, labels=labels)
    plt.show()
    

    figure_1

    opened by ExcaliburZero 12
  • 0.2.3 to 0.2.6 update failed

    0.2.3 to 0.2.6 update failed

    I've just tried to upgrade the package, but it gave the following error:

    Collecting scikit-plot
      Using cached scikit-plot-0.2.6.tar.gz
        Complete output from command python setup.py egg_info:
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/tmp/pip-build-7wut1485/scikit-plot/setup.py", line 9, in <module>
            import scikitplot
          File "/tmp/pip-build-7wut1485/scikit-plot/scikitplot/__init__.py", line 5, in <module>
            from scikitplot.classifiers import classifier_factory
          File "/tmp/pip-build-7wut1485/scikit-plot/scikitplot/classifiers.py", line 5, in <module>
            import matplotlib.pyplot as plt
          File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/pyplot.py", line 115, in <module>
            _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
          File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/backends/__init__.py", line 32, in pylab_setup
            globals(),locals(),[backend_name],0)
          File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/matplotlib/backends/backend_tkagg.py", line 6, in <module>
            from six.moves import tkinter as Tk
          File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 92, in __get__
            result = self._resolve()
          File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 115, in _resolve
            return _import_module(self.mod)
          File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/envs/work-3.6.1/lib/python3.6/site-packages/six.py", line 82, in _import_module
            __import__(name)
          File "/home/paulo/Programs/repos/pyenv/versions/3.6.1/lib/python3.6/tkinter/__init__.py", line 36, in <module>
            import _tkinter # If this fails your Python may not be configured for Tk
        ModuleNotFoundError: No module named '_tkinter'
        
        ----------------------------------------
    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-7wut1485/scikit-plot/
    

    Unfortunately, I don't know how to debug this problem. If you need some info, please don't hesitate to ask!

    opened by paulochf 10
  • Adding a parameter to plot_confusion_matrix() to hide overlaid counts

    Adding a parameter to plot_confusion_matrix() to hide overlaid counts

    Hi @reiinakano,

    Thank you for this great repo! I am using plot_confusion_matrix() but my counts are quite large so the overlaid counts end up overlapping each other and result in a cluttered plot. I was wondering if I could submit a pull request to update this function to add a hide_counts parameter to give the option to not plot the counts? I've already forked and created a branch with the changes. Thank you!

    opened by echan5 7
  • Plot ONLY one class

    Plot ONLY one class

    Hello i have a precision-recall curve where i plot as the following:

    skplt.metrics.plot_precision_recall_curve(y_test, y_probas, curves=['each_class'])
    

    I have two classes in the data (one positive and one negative class with labels 1 and -1 respectively). Questions: How can I plot ONLY the positive class?

    Thank you

    enhancement help wanted 
    opened by foo123 7
  • Plot precision-recall curve for support vector machine classifier

    Plot precision-recall curve for support vector machine classifier

    Hello I want to plot a precision-recall curve for SVC (support vector machine classifier), but the scikit-learn svm classifier does not implement a predict_proba method. How can I do that in scikit-plot (as far as I can see in the documentation it accepts prediction probabilities to plot the curve)?

    Note that the scikit-learn documentation page has an example of precision-recall curve for SVC

    Thank you, Nikos

    opened by foo123 6
  • Add Jupyter notebook examples

    Add Jupyter notebook examples

    It would be nice to have Jupyter notebooks in the "examples" folder showing the different plots as used in a Jupyter notebook. It could contain the same exact code as the examples in the .py files, but adjusted for size (Jupyter notebook plots tend to come out much smaller).

    easy 
    opened by reiinakano 6
  • Update to plot_confusion_matrix (figsize argument and to work if Seaborn is used)

    Update to plot_confusion_matrix (figsize argument and to work if Seaborn is used)

    Using the confusion matrix in a jupyter notebook returns a plot that is quite small. If Seaborn is also used, some values in the plot are hard to read (white text on white lines).

    I have added a figsize-argument to the plot_confusion_matrix and changed the way that values are displayed (now with neutral background box). For larger plots all text-elements scale with the figsize-argument.

    opened by frankherfert 6
  • Adding argument to allow the user to specify which roc_curve are plotted

    Adding argument to allow the user to specify which roc_curve are plotted

    You tagged this issue #14 as help wanted so I thought I'd pitch in. Feel free to edit if it doesn't match your styIe.

    I added a little bit of code to allow the user to pass a list to the roc curve plotting functions to allow them to suppress/show each of the three types of curves: class-specific curves, micro averages, and macro averages.

    opened by doug-friedman 5
  • Error installing No module named sklearn.metrics

    Error installing No module named sklearn.metrics

    Hi there, I am getting an error installing it

    pip install scikit-plot                                                              ~ 1
    Collecting scikit-plot
      Downloading scikit-plot-0.2.1.tar.gz
        Complete output from command python setup.py egg_info:
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\setup.py", line 9, in <module>
            import scikitplot
          File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\__init__.py", line 5, in <module>
            from scikitplot.classifiers import classifier_factory
          File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\classifiers.py", line 7, in <module>
            from scikitplot import plotters
          File "c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\scikitplot\plotters.py", line 9, in <module>
            from sklearn.metrics import confusion_matrix
        ImportError: No module named sklearn.metrics
    
        ----------------------------------------
    Command "python setup.py egg_info" failed with error code 1 in c:\users\arthur\.babun\cygwin\tmp\pip-build-yrgynz\scikit-plot\
    
    opened by ArthurZ 5
  • Throws error

    Throws error "IndexError: too many indices for array" when trying to plot roc for binary classification

    For binary classification, when I input numpy arrays having test label and test probabilities, it throws the following error :

    
    y_true = np.array(ytest)
    y_probas = np.array(p_test)
    skplt.metrics.plot_roc_curve(y_true,y_probas)
    plt.show()
    
    IndexError                                Traceback (most recent call last)
    <ipython-input-49-1b02f082006a> in <module>()
    ----> 1 skplt.metrics.plot_roc_curve(y_true,y_probas)
          2 plt.show()
    
    
    /Users/tarun/anaconda/envs/gl-env/lib/python2.7/site-packages/scikitplot/metrics.pyc in plot_roc_curve(y_true, y_probas, title, curves, ax, figsize, cmap, title_fontsize, text_fontsize)
        247     roc_auc = dict()
        248     for i in range(len(classes)):
    --> 249         fpr[i], tpr[i], _ = roc_curve(y_true, probas[:, i],
        250                                       pos_label=classes[i])
        251         roc_auc[i] = auc(fpr[i], tpr[i])
    
    IndexError: too many indices for array
    
    opened by TarunTater 4
  • Class mismatch in skplt.plot_confusion_matrix when test has fewer classes than training

    Class mismatch in skplt.plot_confusion_matrix when test has fewer classes than training

    Hello, I have an issue when trying to plot a confusion matrix fewer classes in my test set than in training. The class with 12 000+ occcurences in my sample should be labelled 'O' is it possible to get around this, or to include the label set manually as an input?

    image it's not a big issue but would be nice if we could fix it. Thanks for your help

    opened by ArmandGiraud 4
  • Regarding the scikit-plot.metrics.plot_roc function

    Regarding the scikit-plot.metrics.plot_roc function

    In you code I noticed that if we pass classes in the form of their actual meaning instead of (0,1,2 .. ) and we pass it as (c,b,a) then np.unique(y_true) makes the classes in the form of its alphabetical format and this changes the position of the classes that the model was trained on classes = np.unique(y_true)

    fpr_dict[i], tpr_dict[i], _ = roc_curve(y_true, probas[:, i], pos_label=classes[i])

    Hence if you could add a parameter of class_labels in the function

    def plot_roc_multi(y_true, y_probas,class_labels, title='ROC Curves', plot_micro=True, plot_macro=True, classes_to_plot=None, ax=None, figsize=None, cmap='nipy_spectral', title_fontsize="large", text_fontsize="medium"):

    where class_labels is in the form of an array [a,b,c] it would be much easier I think

    opened by Akshay1-6180 1
  • add class prediction error plot

    add class prediction error plot

    This is just a heads up for the programmers that this is a dead library and it functionalities may break any time. There has been not a single update form 2018 and most probably there will be no bug-fix or feature implementations.

    Either implement your own plotting functions or look at other active modules such as yellowbrick.

    scikit-plot is dead and it might break any time, if you use this in production.

    opened by bhishanpdl 0
  • Adding class_names option to gain and lift plots

    Adding class_names option to gain and lift plots

    I'd like to be able to set the names of the classes used in the legend in the the plot_lift_curve() and plot_cumulative_gain() functions.

    I've made a pull request (https://github.com/reiinakano/scikit-plot/pull/109) that adds an optional arg class_names to each of these functions to accomplish this.

    This differs from issue https://github.com/reiinakano/scikit-plot/issues/78 in that I am trying to change the labels in the plot's legend, not omit one of the classes from the plot.

    opened by MichaelFishmanOD 0
Releases(v0.3.7)
  • v0.3.7(Aug 19, 2018)

  • v0.3.5(May 12, 2018)

    New features:

    • plot_precision_recall_curve and plot_roc_curve have been deprecated for plot_precision_recall and plot_roc, respectively. The major difference is the deletion of the curves parameter and the use of plot_macro, plot_micro, and classes_to_plot to choose which curves should be plotted. Thanks to @lugq1990 for this change.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.4(Feb 5, 2018)

  • v0.3.3(Oct 26, 2017)

  • v0.3.2(Oct 25, 2017)

    New Features

    • Gain Chart and Lift Chart added to scikitplot.metrics module #71
    • Updated Jupyter notebook examples for v0.3.x by @ljvmiranda921 #69

    Bugfix

    • Changed deprecated spectral colormap to nipy_spectral by @emredjan #66
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(Sep 17, 2017)

  • v0.3.0(Sep 13, 2017)

    New features:

    • plot_learning_curve has new parameter scoring to allow custom scoring functions. By @jengelman
    • New plotting function plot_calibration_curves

    Deprecations

    • The Factory API has been deprecated and will be removed in v0.4.0
    • scikitplot.plotters has been deprecated and the functions in the Functions API have been distributed to various new modules. See documentation for more details.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.8(Sep 8, 2017)

    Features

    • New option hide_zeros for plot_confusion_matrix by @ExcaliburZero. #39
    • New option to plot only certain labels in plot_confusion_matrix by @ExcaliburZero. #41
    • New options to set colormaps for plot_pca_2d_projection, plot_silhouette, plot_precision_recall_curve, plot_roc_curve, and plot_confusion_matrix. #50

    Bugfix:

    • Fixed bug with nan values in confusion matrices by @ExcaliburZero (#42)
    Source code(tar.gz)
    Source code(zip)
  • v0.2.7(Jul 9, 2017)

  • v0.2.6(May 17, 2017)

  • v0.2.5(Apr 30, 2017)

  • v0.2.4(Apr 25, 2017)

  • v0.2.3(Mar 19, 2017)

    New features:

    • plot_precision_recall_curve and plot_ks_statistic now have a new curves argument that allows the user to choose which curves should be plotted. Thanks to @doug-friedman for this PR.
    • Jupyter notebook examples are now available thanks to @lstmemery
    Source code(tar.gz)
    Source code(zip)
  • v0.2.2(Feb 26, 2017)

    New features:

    • plot_pca_2d_projection function
    • plot_pca_component_variance function
    • plots now have a figsize, title_fontsize, and text_fontsize feature to allow user to customize the size of the plot. This is particularly crucial for Jupyter notebook users where the default settings come out too small. Thanks to @frankherfert for this idea.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Feb 19, 2017)

  • v0.2.0(Feb 18, 2017)

  • v0.1.0(Feb 17, 2017)

Owner
Reiichiro Nakano
I like working on awesome things with awesome people!
Reiichiro Nakano
哔咔漫画window客户端,界面使用PySide2,已实现分类、搜索、收藏夹、下载、在线观看、waifu2x等功能。

picacomic-windows 哔咔漫画window客户端,界面使用PySide2,已实现分类、搜索、收藏夹、下载、在线观看等功能。 功能介绍 登陆分流,还原安卓端的三个分流入口 分类,搜索,排行,收藏夹使用同一的逻辑,滚轮下滑自动加载下一页,双击打开 漫画详情,章节列表和评论列表 下载功能,目

1.8k Dec 31, 2022
A TileDB backend for xarray.

TileDB-xarray This library provides a backend engine to xarray using the TileDB Storage Engine. Example usage: import xarray as xr dataset = xr.open_d

TileDB, Inc. 14 Jun 02, 2021
A customized interface for single cell track visualisation based on pcnaDeep and napari.

pcnaDeep-napari A customized interface for single cell track visualisation based on pcnaDeep and napari. 👀 Under construction You can get test image

ChanLab 2 Nov 07, 2021
Simple CLI python app to show a stocks graph performance. Made with Matplotlib and Tiingo.

stock-graph-python Simple CLI python app to show a stocks graph performance. Made with Matplotlib and Tiingo. Tiingo API Key You will need to add your

Toby 3 May 14, 2022
A research of IT labor market based especially on hh.ru. Salaries, rate of technologies and etc.

hh_ru_research Проект реализован в учебных целях анализа рынка труда, в особенности по hh.ru Input data В качестве входных данных используются сериали

3 Sep 07, 2022
A Python package that provides evaluation and visualization tools for the DexYCB dataset

DexYCB Toolkit DexYCB Toolkit is a Python package that provides evaluation and visualization tools for the DexYCB dataset. The dataset and results wer

NVIDIA Research Projects 107 Dec 26, 2022
Time series visualizer is a flexible extension that provides filling world map by country from real data.

Time-series-visualizer Time series visualizer is a flexible extension that provides filling world map by country from csv or json file. You can know d

Long Ng 3 Jul 09, 2021
Render tokei's output to interactive sunburst chart.

Render tokei's output to interactive sunburst chart.

134 Dec 15, 2022
Matplotlib JOTA style for making figures

Matplotlib JOTA style for making figures This repo has Matplotlib JOTA style to format plots and figures for publications and presentation.

JOTA JORNALISMO 2 May 05, 2022
A Python wrapper of Neighbor Retrieval Visualizer (NeRV)

PyNeRV A Python wrapper of the dimensionality reduction algorithm Neighbor Retrieval Visualizer (NeRV) Compile Set up the paths in Makefile then make.

2 Aug 29, 2021
Custom ROI in Computer Vision Applications

EasyROI Helper library for drawing ROI in Computer Vision Applications Table of Contents EasyROI Table of Contents About The Project Tech Stack File S

43 Dec 09, 2022
Log visualizer for whirl-framework

Lumberjack Log visualizer for whirl-framework Установка pip install -r requirements.txt Как пользоваться python3 lumberjack.py -l путь до лога -o

Vladimir Malinovskii 2 Dec 19, 2022
metedraw is a project mainly for data visualization projects of Atmospheric Science, Marine Science, Environmental Science or other majors

It is mainly for data visualization projects of Atmospheric Science, Marine Science, Environmental Science or other majors.

Nephele 11 Jul 05, 2022
Example Code Notebooks for Data Visualization in Python

This repository contains sample code scripts for creating awesome data visualizations from scratch using different python libraries (such as matplotli

Javed Ali 27 Jan 04, 2023
A Python-based non-fungible token (NFT) generator built using Samilla and Matplotlib

PyNFT A Pythonic NF (non-fungible token) generator built using Samilla and Matplotlib Use python pynft.py [amount] The intention behind this generato

Ayush Gundawar 6 Feb 07, 2022
Smarthome Dashboard with Grafana & InfluxDB

Smarthome Dashboard with Grafana & InfluxDB This is a complete overhaul of my Raspberry Dashboard done with Flask. I switched from sqlite to InfluxDB

6 Oct 20, 2022
An open-source plotting library for statistical data.

Lets-Plot Lets-Plot is an open-source plotting library for statistical data. It is implemented using the Kotlin programming language. The design of Le

JetBrains 820 Jan 06, 2023
Generate a 3D Skyline in STL format and a OpenSCAD file from Gitlab contributions

Your Gitlab's contributions in a 3D Skyline gitlab-skyline is a Python command to generate a skyline figure from Gitlab contributions as Github did at

Félix Gómez 70 Dec 22, 2022
Splore - a simple graphical interface for scrolling through and exploring data sets of molecules

Scroll through and exPLORE molecule sets The splore framework aims to offer a si

3 Jun 18, 2022
Simple, realtime visualization of neural network training performance.

pastalog Simple, realtime visualization server for training neural networks. Use with Lasagne, Keras, Tensorflow, Torch, Theano, and basically everyth

Rewon Child 416 Dec 29, 2022