python partial dependence plot toolbox

Overview

PDPbox

PyPI version Build Status

python partial dependence plot toolbox

Motivation

This repository is inspired by ICEbox. The goal is to visualize the impact of certain features towards model prediction for any supervised learning algorithm. (now support all scikit-learn algorithms)

The common headache

When using black box machine learning algorithms like random forest and boosting, it is hard to understand the relations between predictors and model outcome.

For example, in terms of random forest, all we get is the feature importance. Although we can know which feature is significantly influencing the outcome based on the importance calculation, it really sucks that we don’t know in which direction it is influencing. And in most of the real cases, the effect is non-monotonic.

We need some powerful tools to help understanding the complex relations between predictors and model prediction.

Highlight

  1. Helper functions for visualizing target distribution as well as prediction distribution.
  2. Proper way to handle one-hot encoding features.
  3. Solution for handling complex mutual dependency among features.
  4. Support multi-class classifier.
  5. Support two variable interaction partial dependence plot.

Documentation

Tutorials

https://github.com/SauceCat/PDPbox/tree/master/tutorials

Change Logs

https://github.com/SauceCat/PDPbox/blob/master/CHANGELOG.md

Installation

  • through pip (latest stable version: 0.2.0)

    $ pip install pdpbox
    
  • through git (latest develop version)

    $ git clone https://github.com/SauceCat/PDPbox.git
    $ cd PDPbox
    $ python setup.py install
    

Testing

PDPbox can be tested using tox.

  • First install tox and tox-venv

    $ pip install tox tox-venv
    
  • Call tox inside the pdpbox clone directory. This will run tests with python 2.7 and 3.6 (if available).

  • To test the documentation, call tox -e docs. The documentation should open up in your browser if it is successfully build. Otherwise, the problem with the documentation will be reported in the output of the command.

Gallery

  • PDP: PDP for a single feature

  • PDP: PDP for a multi-class

  • PDP Interact: PDP Interact for two features with contour plot

  • PDP Interact: PDP Interact for two features with grid plot

  • PDP Interact: PDP Interact for multi-class

  • Information plot: target plot for a single feature

  • Information plot: target interact plot for two features

  • Information plot: actual prediction plot for a single feature

Comments
  • The contour_label_fontsize parameter in _pdp_contour_plot() causes TypeError

    The contour_label_fontsize parameter in _pdp_contour_plot() causes TypeError

    On line 251 in pdp_plot_utils.py, one of the parameters for _pdp_contour_plot() is contour_label_fontsize and this causes the following error:

    TypeError: clabel() got an unexpected keyword argument 'contour_label_fontsize'

    According to the documentation for matplotlib.pyplot.clabel(), the parameter should be called fontsize.

    Source: clabel() documentation

    to-do 
    opened by angertdevsingh 19
  • Fontsize/Label error in pdp.pdp_interact_plot when contour = True

    Fontsize/Label error in pdp.pdp_interact_plot when contour = True

    This command works fine and produces the expected results:

    fig, axes = pdp.pdp_interact_plot(
        pdp_interact_out = inter1,
        feature_names=['NOx', 'NO_2'],
        plot_type='grid'
    )
    
    screen shot 2018-09-25 at 11 59 15

    However, changing only plot_type to contour gives an error related to the labels and the font size. The figure appears label-less at the bottom after this error. Any guess or help is appreciated.

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-363-7b31c15b4793> in <module>()
          2     pdp_interact_out = inter1,
          3     feature_names=['NOx', 'NO_2'],
    ----> 4     plot_type='contour'
          5 )
    
    /Users/jsg/Documents/DrivenData_Cold_Forecast/venv/lib/python3.6/site-packages/pdpbox/pdp.py in pdp_interact_plot(pdp_interact_out, feature_names, plot_type, x_quantile, plot_pdp, which_classes, figsize, ncols, plot_params)
        773             fig.add_subplot(inter_ax)
        774             _pdp_inter_one(pdp_interact_out=pdp_interact_plot_data[0], inter_ax=inter_ax, norm=None,
    --> 775                            feature_names=feature_names_adj, **inter_params)
        776     else:
        777         wspace = 0.3
    
    /Users/jsg/Documents/DrivenData_Cold_Forecast/venv/lib/python3.6/site-packages/pdpbox/pdp_plot_utils.py in _pdp_inter_one(pdp_interact_out, feature_names, plot_type, inter_ax, x_quantile, plot_params, norm, ticks)
        330             # for numeric not quantile
        331             X, Y = np.meshgrid(pdp_interact_out.feature_grids[0], pdp_interact_out.feature_grids[1])
    --> 332         im = _pdp_contour_plot(X=X, Y=Y, **inter_params)
        333     elif plot_type == 'grid':
        334         im = _pdp_inter_grid(**inter_params)
    
    /Users/jsg/Documents/DrivenData_Cold_Forecast/venv/lib/python3.6/site-packages/pdpbox/pdp_plot_utils.py in _pdp_contour_plot(X, Y, pdp_mx, inter_ax, cmap, norm, inter_fill_alpha, fontsize, plot_params)
        249     c1 = inter_ax.contourf(X, Y, pdp_mx, N=level, origin='lower', cmap=cmap, norm=norm, alpha=inter_fill_alpha)
        250     c2 = inter_ax.contour(c1, levels=c1.levels, colors=contour_color, origin='lower')
    --> 251     inter_ax.clabel(c2, contour_label_fontsize=fontsize, inline=1)
        252     inter_ax.set_aspect('auto')
        253 
    
    /Users/jsg/Documents/DrivenData_Cold_Forecast/venv/lib/python3.6/site-packages/matplotlib/axes/_axes.py in clabel(self, CS, *args, **kwargs)
       6221 
       6222     def clabel(self, CS, *args, **kwargs):
    -> 6223         return CS.clabel(*args, **kwargs)
       6224     clabel.__doc__ = mcontour.ContourSet.clabel.__doc__
       6225 
    
    TypeError: clabel() got an unexpected keyword argument 'contour_label_fontsize'
    
    screen shot 2018-09-25 at 12 00 51

    Thank you in advance. Awesome library by the way!

    opened by jsga 7
  • Having issue using info_plots.actual_plot

    Having issue using info_plots.actual_plot

    I am following your examples and getting a weird error.

    "fig, axes, summary_df = info_plots.actual_plot(model=forest_reg, X=df_new, feature = '1', feature_name='1')"

    image

    image

    Thanks a lot in advance!

    opened by grechasneak 5
  • Not exactly an issue: dedup DataFrame

    Not exactly an issue: dedup DataFrame

    I just recently started to use this excellent repository to fill in a much needed gap in scikit learn. A suggestion for clarity in the parameters of pdpbox.pdp.pdp_isolate is to require train_X to be a deduplicated pandas dataframe because it caused a bit of confusion on my part when I wasn't able to plot due to the indexing issues from duplicated values. It's really just as simple as df.drop_duplicates(). Thanks for all of your work!

    EDIT:

    Another data checking step should be added at line 303 in pdp.py for using pdp.pdp_interact. If the feature grids are not specified and are defaulted to 10 and train_X.shape[0] is less than 100, then you will have an error on line 305 since data_chunk_size will round to 0. I just need to specify that num_grid_points=[5,5] so that it would run when train_X.shape[0] = 25.

    opened by jrichardhu 4
  • ValueError: cannot reindex from a duplicate axis

    ValueError: cannot reindex from a duplicate axis

    I have a couple of features which are scaled between 0 and 1. For all of those I get a "ValueError: cannot reindex from a duplicate axis". I assume that in creating the columns for the different values of a feature, some rounding happens for their naming, which results in several columns having the same name, although I couldn't trace back the error in the code. Multiplying the column by 10 solves the problem but is of course unintended.

    The error message below.

    Thanks for this beautiful package.

    /home/cdsw/.local/lib/python3.6/site-packages/pdpbox/pdp.py in pdp_plot(pdp_isolate_out, feature_name, center, plot_org_pts, plot_lines, frac_to_plot, cluster, n_cluster_centers, cluster_method, x_quantile, figsize, ncols, plot_params, multi_flag, which_class) 546 _pdp_plot(pdp_isolate_out=pdp_isolate_out, feature_name=feature_name, center=center, plot_org_pts=plot_org_pts, plot_lines=plot_lines, 547 frac_to_plot=frac_to_plot, cluster=cluster, n_cluster_centers=n_cluster_centers, cluster_method=cluster_method, x_quantile=x_quantile, --> 548 ax=ax2, plot_params=plot_params) 549 550

    /home/cdsw/.local/lib/python3.6/site-packages/pdpbox/pdp.py in _pdp_plot(pdp_isolate_out, feature_name, center, plot_org_pts, plot_lines, frac_to_plot, cluster, n_cluster_centers, cluster_method, x_quantile, ax, plot_params) 616 pdp_y -= pdp_y[0] 617 for col in display_columns[1:]: --> 618 ice_lines[col] -= ice_lines[display_columns[0]] 619 ice_lines['actual_preds'] -= ice_lines[display_columns[0]] 620 ice_lines[display_columns[0]] = 0

    /home/cdsw/.local/lib/python3.6/site-packages/pandas/core/ops.py in f(self, other) 895 896 def f(self, other): --> 897 result = method(self, other) 898 899 # this makes sure that we are aligned like the input

    /home/cdsw/.local/lib/python3.6/site-packages/pandas/core/ops.py in f(self, other, axis, level, fill_value) 1552 return _combine_series_frame(self, other, na_op, 1553 fill_value=fill_value, axis=axis, -> 1554 level=level, try_cast=True) 1555 else: 1556 if fill_value is not None:

    /home/cdsw/.local/lib/python3.6/site-packages/pandas/core/ops.py in _combine_series_frame(self, other, func, fill_value, axis, level, try_cast) 1437 # default axis is columns 1438 return self._combine_match_columns(other, func, level=level, -> 1439 try_cast=try_cast) 1440 1441

    /home/cdsw/.local/lib/python3.6/site-packages/pandas/core/frame.py in _combine_match_columns(self, other, func, level, try_cast) 4767 def _combine_match_columns(self, other, func, level=None, try_cast=True): 4768 left, right = self.align(other, join='outer', axis=1, level=level, -> 4769 copy=False) 4770 4771 new_data = left._data.eval(func=func, other=right,

    /home/cdsw/.local/lib/python3.6/site-packages/pandas/core/frame.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis) 3548 method=method, limit=limit, 3549 fill_axis=fill_axis, -> 3550 broadcast_axis=broadcast_axis) 3551 3552 @Appender(_shared_docs['reindex'] % _shared_doc_kwargs)

    /home/cdsw/.local/lib/python3.6/site-packages/pandas/core/generic.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis) 7364 copy=copy, fill_value=fill_value, 7365 method=method, limit=limit, -> 7366 fill_axis=fill_axis) 7367 else: # pragma: no cover 7368 raise TypeError('unsupported type: %s' % type(other))

    /home/cdsw/.local/lib/python3.6/site-packages/pandas/core/generic.py in _align_series(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis) 7461 7462 if lidx is not None: -> 7463 fdata = fdata.reindex_indexer(join_index, lidx, axis=0) 7464 else: 7465 raise ValueError('Must specify axis=0 or 1')

    /home/cdsw/.local/lib/python3.6/site-packages/pandas/core/internals.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy) 4412 # some axes don't allow reindexing with dups 4413 if not allow_dups: -> 4414 self.axes[axis]._can_reindex(indexer) 4415 4416 if axis >= self.ndim:

    /home/cdsw/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in _can_reindex(self, indexer) 3558 # trying to reindex on an axis with duplicates 3559 if not self.is_unique and len(indexer): -> 3560 raise ValueError("cannot reindex from a duplicate axis") 3561 3562 def reindex(self, target, method=None, level=None, limit=None,

    ValueError: cannot reindex from a duplicate axis

    opened by jmoberreuter 3
  • pdp_isolate_obj, pdp_interact_obj don't pickle

    pdp_isolate_obj, pdp_interact_obj don't pickle

    If you try to pickle a pdp_isolate_obj you get a PicklingError:

    PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed
    

    The reason is that currently the model's predict (or predict_proba) method is added as a class member to the object, and pickling instance methods is verboten.

    As far as I can tell, there's no reason to add the predict method to class. The pdp_isolate_obj.predict member isn't used anywhere in the code, and it could quite easily be reconstructed from the model, if it were needed. I'd propose to simply remove this member. Happy to submit a PR, if desired.

    opened by mqk 3
  • pdp_isolate fails for regression tasks

    pdp_isolate fails for regression tasks

    Hi - firstly I'd like to thank you for producing this package, it's really great! I was just reading the ICEBox paper recently and was considering building something, but was delighted to see somebody else already had :)

    I'm having issues with calling pdp_isolate on a regression model - it throws the following exception:

    usr/local/lib/python3.5/dist-packages/PDPbox-0.1-py3.5.egg/pdpbox/pdp.py in pdp_isolate(model, train_X, feature, num_grid_points, percentile_range)
        113     # store the ICE lines
        114     # for multi-classifier, a dictionary is created
    --> 115     if n_classes > 2:
        116         ice_lines = {}
        117         for n_class in range(n_classes):
    
    TypeError: unorderable types: NoneType() > int()
    

    Even the 'Regression.ipynb' example in PDPbox/test/Regression/ does this. A cursory glance at the codebase seems to suggest that when we have a sklearn model without a classes property, n_classes gets set to None on pdp.py line 64. Then all subsequent comparisons of n_classes to an integer will throw this error. Any suggestions?

    opened by NMRobert 3
  • PDPbox saved XGBoost models do not play well with latest XGBoost

    PDPbox saved XGBoost models do not play well with latest XGBoost

    I am trying to execute the code:

    from pdpbox import pdp, get_dataset, info_plots
    test_titanic = get_dataset.titanic()
    

    And I'm having the below error. PDP 0.2.0+13.g73c6966 XGBoost 1.1.0-SNAPSHOT conda environment

    Stacktrace:

    XGBoostError                              Traceback (most recent call last)
    <ipython-input-2-931a5e8d7b9f> in <module>
    ----> 1 test_titanic = get_dataset.titanic()
    
    ~/anaconda3/lib/python3.6/site-packages/PDPbox-0.2.0+13.g73c6966-py3.6.egg/pdpbox/get_dataset.py in titanic()
          7 
          8 def titanic():
    ----> 9         dataset = joblib.load(os.path.join(DIR, 'datasets/test_titanic.pkl'))
         10         return dataset
         11 
    
    ~/anaconda3/lib/python3.6/site-packages/joblib/numpy_pickle.py in load(filename, mmap_mode)
        603                     return load_compatibility(fobj)
        604 
    --> 605                 obj = _unpickle(fobj, filename, mmap_mode)
        606 
        607     return obj
    
    ~/anaconda3/lib/python3.6/site-packages/joblib/numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
        527     obj = None
        528     try:
    --> 529         obj = unpickler.load()
        530         if unpickler.compat_mode:
        531             warnings.warn("The file '%s' has been generated with a "
    
    ~/anaconda3/lib/python3.6/pickle.py in load(self)
       1048                     raise EOFError
       1049                 assert isinstance(key, bytes_types)
    -> 1050                 dispatch[key[0]](self)
       1051         except _Stop as stopinst:
       1052             return stopinst.value
    
    ~/anaconda3/lib/python3.6/site-packages/joblib/numpy_pickle.py in load_build(self)
        340         NDArrayWrapper is used for backward compatibility with joblib <= 0.9.
        341         """
    --> 342         Unpickler.load_build(self)
        343 
        344         # For backward compatibility, we support NDArrayWrapper objects.
    
    ~/anaconda3/lib/python3.6/pickle.py in load_build(self)
       1505         setstate = getattr(inst, "__setstate__", None)
       1506         if setstate is not None:
    -> 1507             setstate(state)
       1508             return
       1509         slotstate = None
    
    ~/anaconda3/lib/python3.6/site-packages/xgboost/core.py in __setstate__(self, state)
       1096             ptr = (ctypes.c_char * len(buf)).from_buffer(buf)
       1097             _check_call(
    -> 1098                 _LIB.XGBoosterUnserializeFromBuffer(handle, ptr, length))
       1099             state['handle'] = handle
       1100         self.__dict__.update(state)
    
    ~/anaconda3/lib/python3.6/site-packages/xgboost/core.py in _check_call(ret)
        187     """
        188     if ret != 0:
    --> 189         raise XGBoostError(py_str(_LIB.XGBGetLastError()))
        190 
        191 
    
    XGBoostError: [18:53:06] /home/sergey/xgboost/src/learner.cc:834: Check failed: header == serialisation_header_: 
    
      If you are loading a serialized model (like pickle in Python) generated by older
      XGBoost, please export the model by calling `Booster.save_model` from that version
      first, then load it back in current version.  There's a simple script for helping
      the process. See:
    
        https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
    
      for reference to the script, and more details about differences between saving model and
      serializing.
    
    
    Stack trace:
      [bt] (0) /home/sergey/anaconda3/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x64) [0x7fe81e08c784]
      [bt] (1) /home/sergey/anaconda3/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::LearnerIO::Load(dmlc::Stream*)+0x674) [0x7fe81e19f444]
      [bt] (2) /home/sergey/anaconda3/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(XGBoosterUnserializeFromBuffer+0x5e) [0x7fe81e07f61e]
      [bt] (3) /home/sergey/anaconda3/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7fe84c23d630]
      [bt] (4) /home/sergey/anaconda3/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7fe84c23cfed]
      [bt] (5) /home/sergey/anaconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7fe84b3c509e]
      [bt] (6) /home/sergey/anaconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x13ad5) [0x7fe84b3c5ad5]
      [bt] (7) /home/sergey/anaconda3/bin/python -m ipykernel -f /home/sergey/.local/share/jupyter/runtime/kernel-813f0269-7bc5-4ef8-b890-fb9b799698ce.json(_PyObject_FastCallDict+0x8b) [0x559094256f8b]
      [bt] (8) /home/sergey/anaconda3/bin/python -m ipykernel -f /home/sergey/.local/share/jupyter/runtime/kernel-813f0269-7bc5-4ef8-b890-fb9b799698ce.json(+0x1a162e) [0x5590942e562e]
    
    to-do 
    opened by sbushmanov 2
  • PDP for One hot encoded features

    PDP for One hot encoded features

    I have a one hot encoded feature, with the resulting datatype of numeric. When I plot the PDP for this feature, I get a weird plot representing nothing, as below: image

    The plot works fine for other numeric feature columns. Only is not working fine for this OHE feature. Any suggestions?

    opened by swaticolab 2
  • info_plots.actual_plot() got an error

    info_plots.actual_plot() got an error

    when i execute the follow code just like binary_classification tutorial: """ fig, axes, summary_df = info_plots.actual_plot( model=titanic_model, X=titanic_data[titanic_features], feature=['Embarked_C', 'Embarked_S', 'Embarked_Q'], feature_name='embarked' ) """ and got follow error: """ TypeError: predict_proba() argument after ** must be a mapping, not NoneType """ i also tried lgb.LGBMClassifier and lgb raw model on my own data but got same error. is there anyone knows how to fix it?

    opened by fenxouxiaoquan 2
  • pdp_interact_plot dimension reference subplot out of alignment.

    pdp_interact_plot dimension reference subplot out of alignment.

    Here is my code to reproduce the problem:

    from pdpbox import pdp, get_dataset, info_plots
    from sklearn.datasets import load_iris
    from sklearn.ensemble import RandomForestClassifier
    import pandas as pd
    import matplotlib.pyplot as plt
    
    %matplotlib inline
    
    # Setup data
    data = load_iris()
    df = pd.DataFrame(data.data, columns = data.feature_names)
    df.index = data.target
    
    # Train basic model
    estimator = RandomForestClassifier()
    model = estimator.fit(df, df.index)
    
    #  pdp_interactions
    pdp_paid= pdp.pdp_interact(
        model=model, dataset=df, model_features=df.columns, features=df.columns, 
        num_grid_points=[5, 5, 5], 
        percentile_ranges=[None, None, None], 
        n_jobs=4
    )
    
    # plotting
    fig, axes = pdp.pdp_interact_plot(
        pdp_paid, ['petal length (cm)', 'petal width (cm)'], plot_type='grid',x_quantile=True, ncols=2, plot_pdp=True, 
        which_classes=[0, 1, 2]
    )
    

    image

    • pdpbox.version == 0.2.0
    • matplotlib.version == 3.0.2

    The problem is that in the reference docs you have, these subplots that show the dimensional values to the left and above each class plot, they are aligned with the grid of the figure. They seem to be squished. I can probably figure out how to reference to axis or figure directly and correct them but is this expected? Any easy fix?

    Thanks! Great library!

    to-do 
    opened by dyerrington 2
  • Use scikit-learn instead of sklearn

    Use scikit-learn instead of sklearn

    Otherwise it fails to install:

    #36 272.1   × python setup.py egg_info did not run successfully.
    #36 272.1   │ exit code: 1
    #36 272.1   ╰─> [18 lines of output]
    #36 272.1       The 'sklearn' PyPI package is deprecated, use 'scikit-learn'
    #36 272.1       rather than 'sklearn' for pip commands.
    #36 272.1       
    #36 272.1       Here is how to fix this error in the main use cases:
    #36 272.1       - use 'pip install scikit-learn' rather than 'pip install sklearn'
    #36 272.1       - replace 'sklearn' by 'scikit-learn' in your pip requirements files
    #36 272.1         (requirements.txt, setup.py, setup.cfg, Pipfile, etc ...)
    #36 272.1       - if the 'sklearn' package is used by one of your dependencies,
    #36 272.1         it would be great if you take some time to track which package uses
    #36 272.1         'sklearn' instead of 'scikit-learn' and report it to their issue tracker
    #36 272.1       - as a last resort, set the environment variable
    #36 272.1         SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True to avoid this error
    #36 272.1       
    #36 272.1       More information is available at
    #36 272.1       https://github.com/scikit-learn/sklearn-pypi-package
    
    opened by Philmod 2
  • help!

    help!

    Have to admit that your package works really well, I really like the contour plots in it to see the impact of the two attribute features. I would like to ask if it is possible to make a 3D plot to observe the common influence of 3 attribute features?

    opened by Turningl 0
  •  Failed building wheel for matplotlib

    Failed building wheel for matplotlib

    While I'm trying to install pdpbox, it seems a conflict happened to my matplotlib: note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for matplotlib Running setup.py clean for matplotlib Failed to build matplotlib Installing collected packages: matplotlib, sklearn, pdpbox Attempting uninstall: matplotlib Found existing installation: matplotlib 3.5.1 Uninstalling matplotlib-3.5.1: Successfully uninstalled matplotlib-3.5.1 Running setup.py install for matplotlib ... error error: subprocess-exited-with-error

    so why it tried to uninstall the version of matplotlib I've installed? How could I fix the problem?

    opened by hofong428 2
  • Replace the value grid calculation by removing nans

    Replace the value grid calculation by removing nans

    Hello,

    would be good to replace the grid value calculation in pdp_calc_utils.py line 237: value_grids = np.percentile(feature_values, percentile_grids) with value_grids = np.nanpercentile(feature_values, percentile_grids) This will avoid returning an array with nans only if the amount of nans is high in the dataset.

    opened by ciornav 0
Releases(v0.2.1)
  • v0.2.1(Mar 14, 2021)

    • Update tutorials for xgboost==1.3.3
    • Add simple model training in tutorials for better understanding and reproducing
    • Fix charts for matplotlib==3.1.1
    • Remove large .pkl files, use separate files for data (*.csv), info (features, target in *.json), and model (.pkl)
    • SImplify unit tests, removing all model-dependent test cases
    • Fix issues in Tox and Travis CI
    Source code(tar.gz)
    Source code(zip)
Owner
Li Jiangchun
If I don't create, I don't understand.
Li Jiangchun
Declarative statistical visualization library for Python

Altair http://altair-viz.github.io Altair is a declarative statistical visualization library for Python. With Altair, you can spend more time understa

Altair 8k Jan 05, 2023
Lightspin AWS IAM Vulnerability Scanner

Red-Shadow Lightspin AWS IAM Vulnerability Scanner Description Scan your AWS IAM Configuration for shadow admins in AWS IAM based on misconfigured den

Lightspin 90 Dec 14, 2022
Easily configurable, chart dashboards from any arbitrary API endpoint. JSON config only

Flask JSONDash Easily configurable, chart dashboards from any arbitrary API endpoint. JSON config only. Ready to go. This project is a flask blueprint

Chris Tabor 3.3k Dec 31, 2022
3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

PyVista Deployment Build Status Metrics Citation License Community 3D plotting and mesh analysis through a streamlined interface for the Visualization

PyVista 1.6k Jan 08, 2023
Fast 1D and 2D histogram functions in Python

About Sometimes you just want to compute simple 1D or 2D histograms with regular bins. Fast. No nonsense. Numpy's histogram functions are versatile, a

Thomas Robitaille 237 Dec 18, 2022
Parse Robinhood 1099 Tax Document from PDF into CSV

Robinhood 1099 Parser This project converts Robinhood Securities 1099 tax document from PDF to CSV file. This tool will be helpful for those who need

Keun Tae (Kevin) Park 52 Jun 10, 2022
Python code for solving 3D structural problems using the finite element method

3DFEM Python 3D finite element code This python code allows for solving 3D structural problems using the finite element method. New features will be a

Rémi Capillon 6 Sep 29, 2022
Histogramming for analysis powered by boost-histogram

Hist Hist is an analyst-friendly front-end for boost-histogram, designed for Python 3.7+ (3.6 users get version 2.4). See what's new. Installation You

Scikit-HEP Project 97 Dec 25, 2022
Arras.io Highest Scores Over Time Bar Chart Race

Arras.io Highest Scores Over Time Bar Chart Race This repo contains a python script (make_racing_bar_chart.py) that can generate a csv file which can

Road 2 Jan 16, 2022
Python library that makes it easy for data scientists to create charts.

Chartify Chartify is a Python library that makes it easy for data scientists to create charts. Why use Chartify? Consistent input data format: Spend l

Spotify 3.2k Jan 01, 2023
:small_red_triangle: Ternary plotting library for python with matplotlib

python-ternary This is a plotting library for use with matplotlib to make ternary plots plots in the two dimensional simplex projected onto a two dime

Marc 611 Dec 29, 2022
Visualize and compare datasets, target values and associations, with one line of code.

In-depth EDA (target analysis, comparison, feature analysis, correlation) in two lines of code! Sweetviz is an open-source Python library that generat

Francois Bertrand 2.3k Jan 05, 2023
哔咔漫画window客户端,界面使用PySide2,已实现分类、搜索、收藏夹、下载、在线观看、waifu2x等功能。

picacomic-windows 哔咔漫画window客户端,界面使用PySide2,已实现分类、搜索、收藏夹、下载、在线观看等功能。 功能介绍 登陆分流,还原安卓端的三个分流入口 分类,搜索,排行,收藏夹使用同一的逻辑,滚轮下滑自动加载下一页,双击打开 漫画详情,章节列表和评论列表 下载功能,目

1.8k Dec 31, 2022
Create matplotlib visualizations from the command-line

MatplotCLI Create matplotlib visualizations from the command-line MatplotCLI is a simple utility to quickly create plots from the command-line, levera

Daniel Moura 46 Dec 16, 2022
A visualization tool made in Pygame for various pathfinding algorithms.

Pathfinding-Visualizer 🚀 A visualization tool made in Pygame for various pathfinding algorithms. Pathfinding is closely related to the shortest path

Aysha sana 7 Jul 09, 2022
Automate the case review on legal case documents and find the most critical cases using network analysis

Automation on Legal Court Cases Review This project is to automate the case review on legal case documents and find the most critical cases using netw

Yi Yin 7 Dec 28, 2022
High performance, editable, stylable datagrids in jupyter and jupyterlab

An ipywidgets wrapper of regular-table for Jupyter. Examples Two Billion Rows Notebook Click Events Notebook Edit Events Notebook Styling Notebook Pan

J.P. Morgan Chase 75 Dec 15, 2022
3D rendered visualization of the austrian monuments registry

Visualization of the Austrian Monuments Visualization of the monument landscape of the austrian monuments registry (Bundesdenkmalamt Denkmalverzeichni

Nikolai Janakiev 3 Oct 24, 2019
CPG represent!

CoolPandasGroup CPG represent! Arianna Brandon Enne Luan Tracie Project requirements: use Pandas to clean and format datasets use Jupyter Notebook to

Enne 3 Feb 07, 2022
Pyan3 - Offline call graph generator for Python 3

Pyan takes one or more Python source files, performs a (rather superficial) static analysis, and constructs a directed graph of the objects in the combined source, and how they define or use each oth

Juha Jeronen 235 Jan 02, 2023