pandas, scikit-learn, xgboost and seaborn integration

Overview

pandas-ml

Latest Docs https://travis-ci.org/pandas-ml/pandas-ml.svg?branch=master

Overview

pandas, scikit-learn and xgboost integration.

Installation

$ pip install pandas_ml

Documentation

http://pandas-ml.readthedocs.org/en/stable/

Example

>>> import pandas_ml as pdml
>>> import sklearn.datasets as datasets

# create ModelFrame instance from sklearn.datasets
>>> df = pdml.ModelFrame(datasets.load_digits())
>>> type(df)
<class 'pandas_ml.core.frame.ModelFrame'>

# binarize data (features), not touching target
>>> df.data = df.data.preprocessing.binarize()
>>> df.head()
   .target  0  1  2  3  4  5  6  7  8 ...  54  55  56  57  58  59  60  61  62  63
0        0  0  0  1  1  1  1  0  0  0 ...   0   0   0   0   1   1   1   0   0   0
1        1  0  0  0  1  1  1  0  0  0 ...   0   0   0   0   0   1   1   1   0   0
2        2  0  0  0  1  1  1  0  0  0 ...   1   0   0   0   0   1   1   1   1   0
3        3  0  0  1  1  1  1  0  0  0 ...   1   0   0   0   1   1   1   1   0   0
4        4  0  0  0  1  1  0  0  0  0 ...   0   0   0   0   0   1   1   1   0   0
[5 rows x 65 columns]

# split to training and test data
>>> train_df, test_df = df.model_selection.train_test_split()

# create estimator (accessor is mapped to sklearn namespace)
>>> estimator = df.svm.LinearSVC()

# fit to training data
>>> train_df.fit(estimator)

# predict test data
>>> test_df.predict(estimator)
0     4
1     2
2     7
...
448    5
449    8
Length: 450, dtype: int64

# Evaluate the result
>>> test_df.metrics.confusion_matrix()
Predicted   0   1   2   3   4   5   6   7   8   9
Target
0          52   0   0   0   0   0   0   0   0   0
1           0  37   1   0   0   1   0   0   3   3
2           0   2  48   1   0   0   0   1   1   0
3           1   1   0  44   0   1   0   0   3   1
4           1   0   0   0  43   0   1   0   0   0
5           0   1   0   0   0  39   0   0   0   0
6           0   1   0   0   1   0  35   0   0   0
7           0   0   0   0   2   0   0  42   1   0
8           0   2   1   0   1   0   0   0  33   1
9           0   2   1   2   0   0   0   0   1  38

Supported Packages

  • scikit-learn
  • patsy
  • xgboost
Comments
  • Fixed imports of deprecated modules which were removed in pandas 0.24.0

    Fixed imports of deprecated modules which were removed in pandas 0.24.0

    Certain functions were deprecated in a previous version of pandas and moved to a different module (see #117). This PR fixes the imports of those functions.

    opened by kristofve 8
  • REL: v0.4.0

    REL: v0.4.0

    • [x] Compat/test for sklearn 0.18.0 (#81)
      • [x] initial fix (#81)
      • [x] wrapper for cross validation classes (re-enable skipped tests) (#85)
      • [x] tests for multioutput (#86)
      • [x] Update doc
    • [x] Compat/test for pandas 0.19.0 (#83)
    • [x] Update release note (#88)
    opened by sinhrks 4
  • Importation error

    Importation error

    I tried to import pandas_ml but it gave the error :

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    I'm running python3.8.1 and I installed pandas_ml via pip (version 20.0.2)

    I dig in the code, error is l.80 of file series.py

    @Appender(pd.core.generic.NDFrame.groupby.__doc__)

    Here pandas is imported at the top of the file with a classic import pandas as pd

    I guess there is a problem with the versions...

    Thanks in advance for any help

    opened by ierezell 2
  • Confusion Matrix no accessible

    Confusion Matrix no accessible

    Hi,

    I've been using confusion_matrix since it was an independent package. I've installed pandas_ml to continue using the package, but it seems that the setup.py script does not install the package.

    Could it be an issue with the find_packages function?

    opened by mmartinortiz 2
  • Seaborn Scatterplot matrix / pairplot integration

    Seaborn Scatterplot matrix / pairplot integration

    import seaborn as sns
    sns.set()
    
    df = sns.load_dataset("iris")
    sns.pairplot(df, hue="species")
    

    displays

    iris_scatter_matrix

    but pairplot doesn't work the same way with ModelFrame

    import pandas as pd
    pd.set_option('max_rows', 10)
    import sklearn.datasets as datasets
    import pandas_ml as pdml  # https://github.com/pandas-ml/pandas-ml
    import seaborn as sns
    import matplotlib.pyplot as plt
    df = pdml.ModelFrame(datasets.load_iris())
    sns.pairplot(df, hue=".target")
    

    iris_modelframe

    There is some useless subplots

    opened by scls19fr 2
  • Error while running train.py from speech commands in tensorflow examples.

    Error while running train.py from speech commands in tensorflow examples.

    Have the following error: File "train.py", line 27, in <module> from callbacks import ConfusionMatrixCallback File "/home/tesseract/ayush_workspace/NLP/WakeWord/tensorflow_trainer/ml/callbacks.py", line 21, in <module> from pandas_ml import ConfusionMatrix File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/__init__.py", line 3, in <module> from pandas_ml.core import ModelFrame, ModelSeries # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/__init__.py", line 3, in <module> from pandas_ml.core.frame import ModelFrame # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/frame.py", line 18, in <module> from pandas_ml.core.series import ModelSeries File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 11, in <module> class ModelSeries(ModelTransformer, pd.Series): File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 80, in ModelSeries @Appender(pd.core.generic.NDFrame.groupby.__doc__) AttributeError: type object 'NDFrame' has no attribute 'groupby' Happening with both version 5 and 6.1

    opened by ayush7 1
  • error for example https://pandas-ml.readthedocs.io/en/latest/xgboost.html

    error for example https://pandas-ml.readthedocs.io/en/latest/xgboost.html

    code from example https://pandas-ml.readthedocs.io/en/latest/xgboost.html '''import pandas_ml as pdml import sklearn.datasets as datasets df = pdml.ModelFrame(datasets.load_digits()) train_df, test_df = df.cross_validation.train_test_split() estimator = df.xgboost.XGBClassifier() train_df.fit(estimator) predicted = test_df.predict(estimator) q=1 test_df.metrics.confusion_matrix() train_df.xgboost.plot_importance()

    tuned_parameters = [{'max_depth': [3, 4]}] cv = df.grid_search.GridSearchCV(df.xgb.XGBClassifier(), tuned_parameters, cv=5)

    df.fit(cv) df.grid_search.describe(cv) q=1

    '''

    gives error ''' File "E:\Pandas\my_code\S_pandas_ml_feb27.py", line 10, in train_df.xgboost.plot_importance() File "C:\Users\sndr\Anaconda3\Lib\site-packages\pandas_ml\xgboost\base.py", line 61, in plot_importance return xgb.plot_importance(self._df.estimator.booster(),

    builtins.TypeError: 'str' object is not callable ''' I use Windows and 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)] Python Type "help", "copyright", "credits" or "license" for more information.

    opened by Sandy4321 1
  • pandas 0.24.0 has deprecated pandas.util.decorators

    pandas 0.24.0 has deprecated pandas.util.decorators

    See https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html#deprecations

    This causes the import statement in https://github.com/pandas-ml/pandas-ml/blob/master/pandas_ml/core/frame.py to break.

    Looks like just need to change it to 'from pandas.utils'

    opened by usul83 1
  • 'mean_absoloute_error

    'mean_absoloute_error

    from sklearn import metrics print('MAE:',metrics.mean_absoloute_error(y_test,y_pred)) module 'sklearn.metrics' has no attribute 'mean_absoloute_error This error is occurred..any solution

    opened by vikramk1507 0
  • AttributeError: type object 'NDFrame' has no attribute 'groupby'

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    from pandas_ml import ConfusionMatrix cm = ConfusionMatrix(actu, pred) cm.print_stats()


    AttributeError Traceback (most recent call last) in ----> 1 from pandas_ml import confusion_matrix 2 3 cm = ConfusionMatrix(actu, pred) 4 cm.print_stats()

    /usr/local/lib/python3.8/site-packages/pandas_ml/init.py in 1 #!/usr/bin/env python 2 ----> 3 from pandas_ml.core import ModelFrame, ModelSeries # noqa 4 from pandas_ml.tools import info # noqa 5 from pandas_ml.version import version as version # noqa

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/init.py in 1 #!/usr/bin/env python 2 ----> 3 from pandas_ml.core.frame import ModelFrame # noqa 4 from pandas_ml.core.series import ModelSeries # noqa

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/frame.py in 16 from pandas_ml.core.accessor import _AccessorMethods 17 from pandas_ml.core.generic import ModelPredictor, _shared_docs ---> 18 from pandas_ml.core.series import ModelSeries 19 20

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/series.py in 9 10 ---> 11 class ModelSeries(ModelTransformer, pd.Series): 12 """ 13 Wrapper for pandas.Series to support sklearn.preprocessing

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/series.py in ModelSeries() 78 return df 79 ---> 80 @Appender(pd.core.generic.NDFrame.groupby.doc) 81 def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, 82 group_keys=True, squeeze=False):

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    opened by gfranco008 5
  • AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score'

    AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score'

    I am using scikit-learn version 0.23.1 and I get the following error: AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score' when calling the function ConfusionMatrix.

    opened by petraknovak 11
  • Error while running train.py from speech commands in tensorflow examples. AttributeError: type object 'NDFrame' has no attribute 'groupby'

    Error while running train.py from speech commands in tensorflow examples. AttributeError: type object 'NDFrame' has no attribute 'groupby'

    Have the following error: File "train.py", line 27, in <module> from callbacks import ConfusionMatrixCallback File "/home/tesseract/ayush_workspace/NLP/WakeWord/tensorflow_trainer/ml/callbacks.py", line 21, in <module> from pandas_ml import ConfusionMatrix File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/__init__.py", line 3, in <module> from pandas_ml.core import ModelFrame, ModelSeries # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/__init__.py", line 3, in <module> from pandas_ml.core.frame import ModelFrame # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/frame.py", line 18, in <module> from pandas_ml.core.series import ModelSeries File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 11, in <module> class ModelSeries(ModelTransformer, pd.Series): File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 80, in ModelSeries @Appender(pd.core.generic.NDFrame.groupby.__doc__) AttributeError: type object 'NDFrame' has no attribute 'groupby' Happening with both version 5 and 6.1

    opened by ayush7 3
  • Pandas 1.0.0rc0/0.6.1 module 'sklearn.preprocessing' has no attribute 'Imputer'

    Pandas 1.0.0rc0/0.6.1 module 'sklearn.preprocessing' has no attribute 'Imputer'

    SKLEARN

    sklearn.preprocessing.Imputer Warning DEPRECATED

    class sklearn.preprocessing.Imputer(*args, **kwargs)[source] Imputation transformer for completing missing values.

    Read more in the User Guide.

    
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-1-e0471065d85c> in <module>
          1 import pandas as pd
          2 import numpy as np
    ----> 3 import pandas_ml as pdml
          4 a1 = np.random.randint(0,2,size=(100,2))
          5 df = pd.DataFrame(a1,columns=['i1','i2'])
    
    C:\g\test\lib\pandas_ml\__init__.py in <module>
          1 #!/usr/bin/env python
          2 
    ----> 3 from pandas_ml.core import ModelFrame, ModelSeries       # noqa
          4 from pandas_ml.tools import info                         # noqa
          5 from pandas_ml.version import version as __version__     # noqa
    
    C:\g\test\lib\pandas_ml\core\__init__.py in <module>
          1 #!/usr/bin/env python
          2 
    ----> 3 from pandas_ml.core.frame import ModelFrame       # noqa
          4 from pandas_ml.core.series import ModelSeries     # noqa
    
    C:\g\test\lib\pandas_ml\core\frame.py in <module>
          8 
          9 import pandas_ml.imbaccessors as imbaccessors
    ---> 10 import pandas_ml.skaccessors as skaccessors
         11 import pandas_ml.smaccessors as smaccessors
         12 import pandas_ml.snsaccessors as snsaccessors
    
    C:\g\test\lib\pandas_ml\skaccessors\__init__.py in <module>
         17 from pandas_ml.skaccessors.neighbors import NeighborsMethods                      # noqa
         18 from pandas_ml.skaccessors.pipeline import PipelineMethods                        # noqa
    ---> 19 from pandas_ml.skaccessors.preprocessing import PreprocessingMethods              # noqa
         20 from pandas_ml.skaccessors.svm import SVMMethods                                  # noqa
    
    C:\g\test\lib\pandas_ml\skaccessors\preprocessing.py in <module>
         11     _keep_col_classes = [pp.Binarizer,
         12                          pp.FunctionTransformer,
    ---> 13                          pp.Imputer,
         14                          pp.KernelCenterer,
         15                          pp.LabelEncoder,
    
    AttributeError: module 'sklearn.preprocessing' has no attribute 'Imputer'
    
    opened by apiszcz 11
Releases(v0.6.1)
Module for statistical learning, with a particular emphasis on time-dependent modelling

Operating system Build Status Linux/Mac Windows tick tick is a Python 3 module for statistical learning, with a particular emphasis on time-dependent

X - Data Science Initiative 410 Dec 14, 2022
Merlion: A Machine Learning Framework for Time Series Intelligence

Merlion is a Python library for time series intelligence. It provides an end-to-end machine learning framework that includes loading and transforming data, building and training models, post-processi

Salesforce 2.8k Jan 05, 2023
CyLP is a Python interface to COIN-OR’s Linear and mixed-integer program solvers (CLP, CBC, and CGL)

CyLP CyLP is a Python interface to COIN-OR’s Linear and mixed-integer program solvers (CLP, CBC, and CGL). CyLP’s unique feature is that you can use i

COIN-OR Foundation 161 Dec 14, 2022
Evidently helps analyze machine learning models during validation or production monitoring

Evidently helps analyze machine learning models during validation or production monitoring. The tool generates interactive visual reports and JSON profiles from pandas DataFrame or csv files. Current

Evidently AI 3.1k Jan 07, 2023
BentoML is a flexible, high-performance framework for serving, managing, and deploying machine learning models.

Model Serving Made Easy BentoML is a flexible, high-performance framework for serving, managing, and deploying machine learning models. Supports multi

BentoML 4.4k Jan 04, 2023
Provide an input CSV and a target field to predict, generate a model + code to run it.

automl-gs Give an input CSV file and a target field you want to predict to automl-gs, and get a trained high-performing machine learning or deep learn

Max Woolf 1.8k Jan 04, 2023
Combines Bayesian analyses from many datasets.

PosteriorStacker Combines Bayesian analyses from many datasets. Introduction Method Tutorial Output plot and files Introduction Fitting a model to a d

Johannes Buchner 19 Feb 13, 2022
Python module for data science and machine learning users.

dsnk-distributions package dsnk distribution is a Python module for data science and machine learning that was created with the goal of reducing calcu

Emmanuel ASIFIWE 1 Nov 23, 2021
MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine Learning work with thousands of other users.

The collaboration platform for Machine Learning MLReef is an open source ML-Ops platform that helps you collaborate, reproduce and share your Machine

MLReef 1.4k Dec 27, 2022
A Lightweight Hyperparameter Optimization Tool 🚀

The mle-hyperopt package provides a simple and intuitive API for hyperparameter optimization of your Machine Learning Experiment (MLE) pipeline.

Robert Lange 137 Dec 02, 2022
Steganography is the art of hiding the fact that communication is taking place, by hiding information in other information.

Steganography is the art of hiding the fact that communication is taking place, by hiding information in other information.

Priyansh Sharma 7 Nov 09, 2022
A toolbox to iNNvestigate neural networks' predictions!

iNNvestigate neural networks! Table of contents Introduction Installation Usage and Examples More documentation Contributing Releases Introduction In

Maximilian Alber 1.1k Jan 05, 2023
A python library for Bayesian time series modeling

PyDLM Welcome to pydlm, a flexible time series modeling library for python. This library is based on the Bayesian dynamic linear model (Harrison and W

Sam 438 Dec 17, 2022
This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing variance.

minvar_invest_portfolio This project used bitcoin, S&P500, and gold to construct an investment portfolio that aimed to minimize risk by minimizing var

1 Jan 06, 2022
A python fast implementation of the famous SVD algorithm popularized by Simon Funk during Netflix Prize

⚡ funk-svd funk-svd is a Python 3 library implementing a fast version of the famous SVD algorithm popularized by Simon Funk during the Neflix Prize co

Geoffrey Bolmier 171 Dec 19, 2022
AutoOED: Automated Optimal Experiment Design Platform

AutoOED is an optimal experiment design platform powered with automated machine learning to accelerate the discovery of optimal solutions. Our platform solves multi-objective optimization problems an

Yunsheng Tian 107 Jan 03, 2023
Dual Adaptive Sampling for Machine Learning Interatomic potential.

DAS Dual Adaptive Sampling for Machine Learning Interatomic potential. How to cite If you use this code in your research, please cite this using: Hong

6 Jul 06, 2022
Distributed deep learning on Hadoop and Spark clusters.

Note: we're lovingly marking this project as Archived since we're no longer supporting it. You are welcome to read the code and fork your own version

Yahoo 1.3k Dec 28, 2022
Quantum Machine Learning

The Machine Learning package simply contains sample datasets at present. It has some classification algorithms such as QSVM and VQC (Variational Quantum Classifier), where this data can be used for e

Qiskit 364 Jan 08, 2023
Model factory is a ML training platform to help engineers to build ML models at scale

Model Factory Machine learning today is powering many businesses today, e.g., search engine, e-commerce, news or feed recommendation. Training high qu

16 Sep 23, 2022