pandas, scikit-learn, xgboost and seaborn integration

Overview

pandas-ml

Latest Docs https://travis-ci.org/pandas-ml/pandas-ml.svg?branch=master

Overview

pandas, scikit-learn and xgboost integration.

Installation

$ pip install pandas_ml

Documentation

http://pandas-ml.readthedocs.org/en/stable/

Example

>>> import pandas_ml as pdml
>>> import sklearn.datasets as datasets

# create ModelFrame instance from sklearn.datasets
>>> df = pdml.ModelFrame(datasets.load_digits())
>>> type(df)
<class 'pandas_ml.core.frame.ModelFrame'>

# binarize data (features), not touching target
>>> df.data = df.data.preprocessing.binarize()
>>> df.head()
   .target  0  1  2  3  4  5  6  7  8 ...  54  55  56  57  58  59  60  61  62  63
0        0  0  0  1  1  1  1  0  0  0 ...   0   0   0   0   1   1   1   0   0   0
1        1  0  0  0  1  1  1  0  0  0 ...   0   0   0   0   0   1   1   1   0   0
2        2  0  0  0  1  1  1  0  0  0 ...   1   0   0   0   0   1   1   1   1   0
3        3  0  0  1  1  1  1  0  0  0 ...   1   0   0   0   1   1   1   1   0   0
4        4  0  0  0  1  1  0  0  0  0 ...   0   0   0   0   0   1   1   1   0   0
[5 rows x 65 columns]

# split to training and test data
>>> train_df, test_df = df.model_selection.train_test_split()

# create estimator (accessor is mapped to sklearn namespace)
>>> estimator = df.svm.LinearSVC()

# fit to training data
>>> train_df.fit(estimator)

# predict test data
>>> test_df.predict(estimator)
0     4
1     2
2     7
...
448    5
449    8
Length: 450, dtype: int64

# Evaluate the result
>>> test_df.metrics.confusion_matrix()
Predicted   0   1   2   3   4   5   6   7   8   9
Target
0          52   0   0   0   0   0   0   0   0   0
1           0  37   1   0   0   1   0   0   3   3
2           0   2  48   1   0   0   0   1   1   0
3           1   1   0  44   0   1   0   0   3   1
4           1   0   0   0  43   0   1   0   0   0
5           0   1   0   0   0  39   0   0   0   0
6           0   1   0   0   1   0  35   0   0   0
7           0   0   0   0   2   0   0  42   1   0
8           0   2   1   0   1   0   0   0  33   1
9           0   2   1   2   0   0   0   0   1  38

Supported Packages

  • scikit-learn
  • patsy
  • xgboost
Comments
  • Fixed imports of deprecated modules which were removed in pandas 0.24.0

    Fixed imports of deprecated modules which were removed in pandas 0.24.0

    Certain functions were deprecated in a previous version of pandas and moved to a different module (see #117). This PR fixes the imports of those functions.

    opened by kristofve 8
  • REL: v0.4.0

    REL: v0.4.0

    • [x] Compat/test for sklearn 0.18.0 (#81)
      • [x] initial fix (#81)
      • [x] wrapper for cross validation classes (re-enable skipped tests) (#85)
      • [x] tests for multioutput (#86)
      • [x] Update doc
    • [x] Compat/test for pandas 0.19.0 (#83)
    • [x] Update release note (#88)
    opened by sinhrks 4
  • Importation error

    Importation error

    I tried to import pandas_ml but it gave the error :

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    I'm running python3.8.1 and I installed pandas_ml via pip (version 20.0.2)

    I dig in the code, error is l.80 of file series.py

    @Appender(pd.core.generic.NDFrame.groupby.__doc__)

    Here pandas is imported at the top of the file with a classic import pandas as pd

    I guess there is a problem with the versions...

    Thanks in advance for any help

    opened by ierezell 2
  • Confusion Matrix no accessible

    Confusion Matrix no accessible

    Hi,

    I've been using confusion_matrix since it was an independent package. I've installed pandas_ml to continue using the package, but it seems that the setup.py script does not install the package.

    Could it be an issue with the find_packages function?

    opened by mmartinortiz 2
  • Seaborn Scatterplot matrix / pairplot integration

    Seaborn Scatterplot matrix / pairplot integration

    import seaborn as sns
    sns.set()
    
    df = sns.load_dataset("iris")
    sns.pairplot(df, hue="species")
    

    displays

    iris_scatter_matrix

    but pairplot doesn't work the same way with ModelFrame

    import pandas as pd
    pd.set_option('max_rows', 10)
    import sklearn.datasets as datasets
    import pandas_ml as pdml  # https://github.com/pandas-ml/pandas-ml
    import seaborn as sns
    import matplotlib.pyplot as plt
    df = pdml.ModelFrame(datasets.load_iris())
    sns.pairplot(df, hue=".target")
    

    iris_modelframe

    There is some useless subplots

    opened by scls19fr 2
  • Error while running train.py from speech commands in tensorflow examples.

    Error while running train.py from speech commands in tensorflow examples.

    Have the following error: File "train.py", line 27, in <module> from callbacks import ConfusionMatrixCallback File "/home/tesseract/ayush_workspace/NLP/WakeWord/tensorflow_trainer/ml/callbacks.py", line 21, in <module> from pandas_ml import ConfusionMatrix File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/__init__.py", line 3, in <module> from pandas_ml.core import ModelFrame, ModelSeries # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/__init__.py", line 3, in <module> from pandas_ml.core.frame import ModelFrame # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/frame.py", line 18, in <module> from pandas_ml.core.series import ModelSeries File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 11, in <module> class ModelSeries(ModelTransformer, pd.Series): File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 80, in ModelSeries @Appender(pd.core.generic.NDFrame.groupby.__doc__) AttributeError: type object 'NDFrame' has no attribute 'groupby' Happening with both version 5 and 6.1

    opened by ayush7 1
  • error for example https://pandas-ml.readthedocs.io/en/latest/xgboost.html

    error for example https://pandas-ml.readthedocs.io/en/latest/xgboost.html

    code from example https://pandas-ml.readthedocs.io/en/latest/xgboost.html '''import pandas_ml as pdml import sklearn.datasets as datasets df = pdml.ModelFrame(datasets.load_digits()) train_df, test_df = df.cross_validation.train_test_split() estimator = df.xgboost.XGBClassifier() train_df.fit(estimator) predicted = test_df.predict(estimator) q=1 test_df.metrics.confusion_matrix() train_df.xgboost.plot_importance()

    tuned_parameters = [{'max_depth': [3, 4]}] cv = df.grid_search.GridSearchCV(df.xgb.XGBClassifier(), tuned_parameters, cv=5)

    df.fit(cv) df.grid_search.describe(cv) q=1

    '''

    gives error ''' File "E:\Pandas\my_code\S_pandas_ml_feb27.py", line 10, in train_df.xgboost.plot_importance() File "C:\Users\sndr\Anaconda3\Lib\site-packages\pandas_ml\xgboost\base.py", line 61, in plot_importance return xgb.plot_importance(self._df.estimator.booster(),

    builtins.TypeError: 'str' object is not callable ''' I use Windows and 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)] Python Type "help", "copyright", "credits" or "license" for more information.

    opened by Sandy4321 1
  • pandas 0.24.0 has deprecated pandas.util.decorators

    pandas 0.24.0 has deprecated pandas.util.decorators

    See https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html#deprecations

    This causes the import statement in https://github.com/pandas-ml/pandas-ml/blob/master/pandas_ml/core/frame.py to break.

    Looks like just need to change it to 'from pandas.utils'

    opened by usul83 1
  • 'mean_absoloute_error

    'mean_absoloute_error

    from sklearn import metrics print('MAE:',metrics.mean_absoloute_error(y_test,y_pred)) module 'sklearn.metrics' has no attribute 'mean_absoloute_error This error is occurred..any solution

    opened by vikramk1507 0
  • AttributeError: type object 'NDFrame' has no attribute 'groupby'

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    from pandas_ml import ConfusionMatrix cm = ConfusionMatrix(actu, pred) cm.print_stats()


    AttributeError Traceback (most recent call last) in ----> 1 from pandas_ml import confusion_matrix 2 3 cm = ConfusionMatrix(actu, pred) 4 cm.print_stats()

    /usr/local/lib/python3.8/site-packages/pandas_ml/init.py in 1 #!/usr/bin/env python 2 ----> 3 from pandas_ml.core import ModelFrame, ModelSeries # noqa 4 from pandas_ml.tools import info # noqa 5 from pandas_ml.version import version as version # noqa

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/init.py in 1 #!/usr/bin/env python 2 ----> 3 from pandas_ml.core.frame import ModelFrame # noqa 4 from pandas_ml.core.series import ModelSeries # noqa

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/frame.py in 16 from pandas_ml.core.accessor import _AccessorMethods 17 from pandas_ml.core.generic import ModelPredictor, _shared_docs ---> 18 from pandas_ml.core.series import ModelSeries 19 20

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/series.py in 9 10 ---> 11 class ModelSeries(ModelTransformer, pd.Series): 12 """ 13 Wrapper for pandas.Series to support sklearn.preprocessing

    /usr/local/lib/python3.8/site-packages/pandas_ml/core/series.py in ModelSeries() 78 return df 79 ---> 80 @Appender(pd.core.generic.NDFrame.groupby.doc) 81 def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, 82 group_keys=True, squeeze=False):

    AttributeError: type object 'NDFrame' has no attribute 'groupby'

    opened by gfranco008 5
  • AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score'

    AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score'

    I am using scikit-learn version 0.23.1 and I get the following error: AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score' when calling the function ConfusionMatrix.

    opened by petraknovak 11
  • Error while running train.py from speech commands in tensorflow examples. AttributeError: type object 'NDFrame' has no attribute 'groupby'

    Error while running train.py from speech commands in tensorflow examples. AttributeError: type object 'NDFrame' has no attribute 'groupby'

    Have the following error: File "train.py", line 27, in <module> from callbacks import ConfusionMatrixCallback File "/home/tesseract/ayush_workspace/NLP/WakeWord/tensorflow_trainer/ml/callbacks.py", line 21, in <module> from pandas_ml import ConfusionMatrix File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/__init__.py", line 3, in <module> from pandas_ml.core import ModelFrame, ModelSeries # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/__init__.py", line 3, in <module> from pandas_ml.core.frame import ModelFrame # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/frame.py", line 18, in <module> from pandas_ml.core.series import ModelSeries File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 11, in <module> class ModelSeries(ModelTransformer, pd.Series): File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 80, in ModelSeries @Appender(pd.core.generic.NDFrame.groupby.__doc__) AttributeError: type object 'NDFrame' has no attribute 'groupby' Happening with both version 5 and 6.1

    opened by ayush7 3
  • Pandas 1.0.0rc0/0.6.1 module 'sklearn.preprocessing' has no attribute 'Imputer'

    Pandas 1.0.0rc0/0.6.1 module 'sklearn.preprocessing' has no attribute 'Imputer'

    SKLEARN

    sklearn.preprocessing.Imputer Warning DEPRECATED

    class sklearn.preprocessing.Imputer(*args, **kwargs)[source] Imputation transformer for completing missing values.

    Read more in the User Guide.

    
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-1-e0471065d85c> in <module>
          1 import pandas as pd
          2 import numpy as np
    ----> 3 import pandas_ml as pdml
          4 a1 = np.random.randint(0,2,size=(100,2))
          5 df = pd.DataFrame(a1,columns=['i1','i2'])
    
    C:\g\test\lib\pandas_ml\__init__.py in <module>
          1 #!/usr/bin/env python
          2 
    ----> 3 from pandas_ml.core import ModelFrame, ModelSeries       # noqa
          4 from pandas_ml.tools import info                         # noqa
          5 from pandas_ml.version import version as __version__     # noqa
    
    C:\g\test\lib\pandas_ml\core\__init__.py in <module>
          1 #!/usr/bin/env python
          2 
    ----> 3 from pandas_ml.core.frame import ModelFrame       # noqa
          4 from pandas_ml.core.series import ModelSeries     # noqa
    
    C:\g\test\lib\pandas_ml\core\frame.py in <module>
          8 
          9 import pandas_ml.imbaccessors as imbaccessors
    ---> 10 import pandas_ml.skaccessors as skaccessors
         11 import pandas_ml.smaccessors as smaccessors
         12 import pandas_ml.snsaccessors as snsaccessors
    
    C:\g\test\lib\pandas_ml\skaccessors\__init__.py in <module>
         17 from pandas_ml.skaccessors.neighbors import NeighborsMethods                      # noqa
         18 from pandas_ml.skaccessors.pipeline import PipelineMethods                        # noqa
    ---> 19 from pandas_ml.skaccessors.preprocessing import PreprocessingMethods              # noqa
         20 from pandas_ml.skaccessors.svm import SVMMethods                                  # noqa
    
    C:\g\test\lib\pandas_ml\skaccessors\preprocessing.py in <module>
         11     _keep_col_classes = [pp.Binarizer,
         12                          pp.FunctionTransformer,
    ---> 13                          pp.Imputer,
         14                          pp.KernelCenterer,
         15                          pp.LabelEncoder,
    
    AttributeError: module 'sklearn.preprocessing' has no attribute 'Imputer'
    
    opened by apiszcz 11
Releases(v0.6.1)
Cool Python features for machine learning that I used to be too afraid to use. Will be updated as I have more time / learn more.

python-is-cool A gentle guide to the Python features that I didn't know existed or was too afraid to use. This will be updated as I learn more and bec

Chip Huyen 3.3k Jan 05, 2023
vortex particles for simulating smoke in 2d

vortex-particles-method-2d vortex particles for simulating smoke in 2d -vortexparticles_s

12 Aug 23, 2022
The Simpsons and Machine Learning: What makes an Episode Great?

The Simpsons and Machine Learning: What makes an Episode Great? Check out my Medium article on this! PROBLEM: The Simpsons has had a decline in qualit

1 Nov 02, 2021
Machine Learning Algorithms ( Desion Tree, XG Boost, Random Forest )

implementation of machine learning Algorithms such as decision tree and random forest and xgboost on darasets then compare results for each and implement ant colony and genetic algorithms on tsp map,

Mohamadreza Rezaei 1 Jan 19, 2022
A handy tool for common machine learning models' hyper-parameter tuning.

Common machine learning models' hyperparameter tuning This repo is for a collection of hyper-parameter tuning for "common" machine learning models, in

Kevin Hu 2 Jan 27, 2022
Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

Tangram Website | Discord Tangram makes it easy for programmers to train, deploy, and monitor machine learning models. Run tangram train to train a mo

Tangram 1.4k Jan 05, 2023
Simulation of early COVID-19 using SIR model and variants (SEIR ...).

COVID-19-simulation Simulation of early COVID-19 using SIR model and variants (SEIR ...). Made by the Laboratory of Sustainable Life Assessment (GYRO)

José Paulo Pereira das Dores Savioli 1 Nov 17, 2021
Regularization and Feature Selection in Least Squares Temporal Difference Learning

Regularization and Feature Selection in Least Squares Temporal Difference Learning Description This is Python implementations of Least Angle Regressio

Mina Parham 0 Jan 18, 2022
Pandas Machine Learning and Quant Finance Library Collection

Pandas Machine Learning and Quant Finance Library Collection

148 Dec 07, 2022
Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them

Sleep stages are classified with the help of ML. We have used 4 different ML algorithms (SVM, KNN, RF, NN) to demonstrate them.

Anirudh Edpuganti 3 Apr 03, 2022
Kalman filter library

The kalman filter framework described here is an incredibly powerful tool for any optimization problem, but particularly for visual odometry, sensor fusion localization or SLAM.

comma.ai 276 Jan 01, 2023
Price forecasting of SGB and IRFC Bonds and comparing there returns

Project_Bonds Project Title : Price forecasting of SGB and IRFC Bonds and comparing there returns. Introduction of the Project The 2008-09 global fina

Tishya S 1 Oct 28, 2021
A Software Framework for Neuromorphic Computing

A Software Framework for Neuromorphic Computing

Lava 338 Dec 26, 2022
Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm.

Naive-Bayes Spam Classificator Module is created to build a spam filter using Python and the multinomial Naive Bayes algorithm. Main goal is to code a

Viktoria Maksymiuk 1 Jun 27, 2022
Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Python Extreme Learning Machine (ELM) Python Extreme Learning Machine (ELM) is a machine learning technique used for classification/regression tasks.

Augusto Almeida 84 Nov 25, 2022
Python Research Framework

Python Research Framework

EleutherAI 106 Dec 13, 2022
A simple python program that draws a tree for incrementing values using the Collatz Conjecture.

Collatz Conjecture A simple python program that draws a tree for incrementing values using the Collatz Conjecture. Values which can be edited: Length

davidgasinski 1 Oct 28, 2021
Pyomo is an object-oriented algebraic modeling language in Python for structured optimization problems.

Pyomo is a Python-based open-source software package that supports a diverse set of optimization capabilities for formulating and analyzing optimization models. Pyomo can be used to define symbolic p

Pyomo 1.4k Dec 28, 2022
A Python step-by-step primer for Machine Learning and Optimization

early-ML Presentation General Machine Learning tutorials A Python step-by-step primer for Machine Learning and Optimization This github repository gat

Dimitri Bettebghor 8 Dec 01, 2022
ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions

A library for debugging/inspecting machine learning classifiers and explaining their predictions

154 Dec 17, 2022