pandas, scikit-learn, xgboost and seaborn integration

Last update: Dec 30, 2022

Related tags

Machine Learning pandas-ml

Overview

pandas-ml

https://travis-ci.org/pandas-ml/pandas-ml.svg?branch=master

Overview

pandas, scikit-learn and xgboost integration.

Installation

$ pip install pandas_ml

Documentation

http://pandas-ml.readthedocs.org/en/stable/

Example

>>> import pandas_ml as pdml
>>> import sklearn.datasets as datasets

# create ModelFrame instance from sklearn.datasets
>>> df = pdml.ModelFrame(datasets.load_digits())
>>> type(df)
<class 'pandas_ml.core.frame.ModelFrame'>

# binarize data (features), not touching target
>>> df.data = df.data.preprocessing.binarize()
>>> df.head()
   .target  0  1  2  3  4  5  6  7  8 ...  54  55  56  57  58  59  60  61  62  63
0        0  0  0  1  1  1  1  0  0  0 ...   0   0   0   0   1   1   1   0   0   0
1        1  0  0  0  1  1  1  0  0  0 ...   0   0   0   0   0   1   1   1   0   0
2        2  0  0  0  1  1  1  0  0  0 ...   1   0   0   0   0   1   1   1   1   0
3        3  0  0  1  1  1  1  0  0  0 ...   1   0   0   0   1   1   1   1   0   0
4        4  0  0  0  1  1  0  0  0  0 ...   0   0   0   0   0   1   1   1   0   0
[5 rows x 65 columns]

# split to training and test data
>>> train_df, test_df = df.model_selection.train_test_split()

# create estimator (accessor is mapped to sklearn namespace)
>>> estimator = df.svm.LinearSVC()

# fit to training data
>>> train_df.fit(estimator)

# predict test data
>>> test_df.predict(estimator)
0     4
1     2
2     7
...
448    5
449    8
Length: 450, dtype: int64

# Evaluate the result
>>> test_df.metrics.confusion_matrix()
Predicted   0   1   2   3   4   5   6   7   8   9
Target
0          52   0   0   0   0   0   0   0   0   0
1           0  37   1   0   0   1   0   0   3   3
2           0   2  48   1   0   0   0   1   1   0
3           1   1   0  44   0   1   0   0   3   1
4           1   0   0   0  43   0   1   0   0   0
5           0   1   0   0   0  39   0   0   0   0
6           0   1   0   0   1   0  35   0   0   0
7           0   0   0   0   2   0   0  42   1   0
8           0   2   1   0   1   0   0   0  33   1
9           0   2   1   2   0   0   0   0   1  38

Supported Packages

scikit-learn
patsy
xgboost

Comments

Fixed imports of deprecated modules which were removed in pandas 0.24.0

Certain functions were deprecated in a previous version of pandas and moved to a different module (see #117). This PR fixes the imports of those functions.

opened by kristofve 8
REL: v0.4.0
[x] Compat/test for sklearn 0.18.0 (#81)

[x] initial fix (#81)

[x] wrapper for cross validation classes (re-enable skipped tests) (#85)

[x] tests for multioutput (#86)

[x] Update doc

[x] Compat/test for pandas 0.19.0 (#83)

[x] Update release note (#88)
opened by sinhrks 4
Importation error

I tried to import pandas_ml but it gave the error :

AttributeError: type object 'NDFrame' has no attribute 'groupby'

I'm running python3.8.1 and I installed pandas_ml via pip (version 20.0.2)

I dig in the code, error is l.80 of file series.py

@Appender(pd.core.generic.NDFrame.groupby.__doc__)

Here pandas is imported at the top of the file with a classic import pandas as pd

I guess there is a problem with the versions...

Thanks in advance for any help

opened by ierezell 2
Confusion Matrix no accessible

Hi,

I've been using confusion_matrix since it was an independent package. I've installed pandas_ml to continue using the package, but it seems that the setup.py script does not install the package.

Could it be an issue with the find_packages function?

opened by mmartinortiz 2

Seaborn Scatterplot matrix / pairplot integration

import seaborn as sns
sns.set()

df = sns.load_dataset("iris")
sns.pairplot(df, hue="species")

displays

iris_scatter_matrix

but pairplot doesn't work the same way with ModelFrame

import pandas as pd
pd.set_option('max_rows', 10)
import sklearn.datasets as datasets
import pandas_ml as pdml  # https://github.com/pandas-ml/pandas-ml
import seaborn as sns
import matplotlib.pyplot as plt
df = pdml.ModelFrame(datasets.load_iris())
sns.pairplot(df, hue=".target")

iris_modelframe

There is some useless subplots

opened by scls19fr 2

Error while running train.py from speech commands in tensorflow examples.

Have the following error: File "train.py", line 27, in <module> from callbacks import ConfusionMatrixCallback File "/home/tesseract/ayush_workspace/NLP/WakeWord/tensorflow_trainer/ml/callbacks.py", line 21, in <module> from pandas_ml import ConfusionMatrix File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/__init__.py", line 3, in <module> from pandas_ml.core import ModelFrame, ModelSeries # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/__init__.py", line 3, in <module> from pandas_ml.core.frame import ModelFrame # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/frame.py", line 18, in <module> from pandas_ml.core.series import ModelSeries File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 11, in <module> class ModelSeries(ModelTransformer, pd.Series): File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 80, in ModelSeries @Appender(pd.core.generic.NDFrame.groupby.__doc__) AttributeError: type object 'NDFrame' has no attribute 'groupby' Happening with both version 5 and 6.1

opened by ayush7 1
error for example https://pandas-ml.readthedocs.io/en/latest/xgboost.html

code from example https://pandas-ml.readthedocs.io/en/latest/xgboost.html '''import pandas_ml as pdml import sklearn.datasets as datasets df = pdml.ModelFrame(datasets.load_digits()) train_df, test_df = df.cross_validation.train_test_split() estimator = df.xgboost.XGBClassifier() train_df.fit(estimator) predicted = test_df.predict(estimator) q=1 test_df.metrics.confusion_matrix() train_df.xgboost.plot_importance()

tuned_parameters = [{'max_depth': [3, 4]}] cv = df.grid_search.GridSearchCV(df.xgb.XGBClassifier(), tuned_parameters, cv=5)

df.fit(cv) df.grid_search.describe(cv) q=1

'''

gives error ''' File "E:\Pandas\my_code\S_pandas_ml_feb27.py", line 10, in train_df.xgboost.plot_importance() File "C:\Users\sndr\Anaconda3\Lib\site-packages\pandas_ml\xgboost\base.py", line 61, in plot_importance return xgb.plot_importance(self._df.estimator.booster(),

builtins.TypeError: 'str' object is not callable ''' I use Windows and 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)] Python Type "help", "copyright", "credits" or "license" for more information.

opened by Sandy4321 1
pandas 0.24.0 has deprecated pandas.util.decorators

See https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html#deprecations

This causes the import statement in https://github.com/pandas-ml/pandas-ml/blob/master/pandas_ml/core/frame.py to break.

Looks like just need to change it to 'from pandas.utils'

opened by usul83 1
'mean_absoloute_error

from sklearn import metrics print('MAE:',metrics.mean_absoloute_error(y_test,y_pred)) module 'sklearn.metrics' has no attribute 'mean_absoloute_error This error is occurred..any solution

opened by vikramk1507 0
AttributeError: type object 'NDFrame' has no attribute 'groupby'

AttributeError: type object 'NDFrame' has no attribute 'groupby'

from pandas_ml import ConfusionMatrix cm = ConfusionMatrix(actu, pred) cm.print_stats()

AttributeError Traceback (most recent call last) in ----> 1 from pandas_ml import confusion_matrix 2 3 cm = ConfusionMatrix(actu, pred) 4 cm.print_stats()

/usr/local/lib/python3.8/site-packages/pandas_ml/init.py in 1 #!/usr/bin/env python 2 ----> 3 from pandas_ml.core import ModelFrame, ModelSeries # noqa 4 from pandas_ml.tools import info # noqa 5 from pandas_ml.version import version as version # noqa

/usr/local/lib/python3.8/site-packages/pandas_ml/core/init.py in 1 #!/usr/bin/env python 2 ----> 3 from pandas_ml.core.frame import ModelFrame # noqa 4 from pandas_ml.core.series import ModelSeries # noqa

/usr/local/lib/python3.8/site-packages/pandas_ml/core/frame.py in 16 from pandas_ml.core.accessor import _AccessorMethods 17 from pandas_ml.core.generic import ModelPredictor, _shared_docs ---> 18 from pandas_ml.core.series import ModelSeries 19 20

/usr/local/lib/python3.8/site-packages/pandas_ml/core/series.py in 9 10 ---> 11 class ModelSeries(ModelTransformer, pd.Series): 12 """ 13 Wrapper for pandas.Series to support sklearn.preprocessing

/usr/local/lib/python3.8/site-packages/pandas_ml/core/series.py in ModelSeries() 78 return df 79 ---> 80 @Appender(pd.core.generic.NDFrame.groupby.doc) 81 def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True, 82 group_keys=True, squeeze=False):

AttributeError: type object 'NDFrame' has no attribute 'groupby'

opened by gfranco008 5
AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score'

I am using scikit-learn version 0.23.1 and I get the following error: AttributeError: module 'sklearn.metrics' has no attribute 'jaccard_similarity_score' when calling the function ConfusionMatrix.

opened by petraknovak 11
Error while running train.py from speech commands in tensorflow examples. AttributeError: type object 'NDFrame' has no attribute 'groupby'

Have the following error: File "train.py", line 27, in <module> from callbacks import ConfusionMatrixCallback File "/home/tesseract/ayush_workspace/NLP/WakeWord/tensorflow_trainer/ml/callbacks.py", line 21, in <module> from pandas_ml import ConfusionMatrix File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/__init__.py", line 3, in <module> from pandas_ml.core import ModelFrame, ModelSeries # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/__init__.py", line 3, in <module> from pandas_ml.core.frame import ModelFrame # noqa File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/frame.py", line 18, in <module> from pandas_ml.core.series import ModelSeries File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 11, in <module> class ModelSeries(ModelTransformer, pd.Series): File "/home/tesseract/anaconda3/envs/ciao/lib/python3.6/site-packages/pandas_ml/core/series.py", line 80, in ModelSeries @Appender(pd.core.generic.NDFrame.groupby.__doc__) AttributeError: type object 'NDFrame' has no attribute 'groupby' Happening with both version 5 and 6.1

opened by ayush7 3

Pandas 1.0.0rc0/0.6.1 module 'sklearn.preprocessing' has no attribute 'Imputer'

SKLEARN

sklearn.preprocessing.Imputer Warning DEPRECATED

class sklearn.preprocessing.Imputer(*args, **kwargs)[source] Imputation transformer for completing missing values.

Releases(v0.6.1)

v0.6.1(Mar 5, 2019)

Source code(tar.gz)
Source code(zip)
v0.6.0(Jan 15, 2019)

Source code(tar.gz)
Source code(zip)
v0.5.0(Nov 16, 2017)

Source code(tar.gz)
Source code(zip)
v0.4.0(Oct 15, 2016)
Support scikit-learn v0.17.x and v0.18.0.

Support imbalanced-learn via .imbalance accessor.

Added pandas_ml.ConfusionMatrix class for easier classification results evaluation.

Source code(tar.gz)
Source code(zip)
v0.3.0(Oct 22, 2015)

Source code(tar.gz)
Source code(zip)
v0.2.0(Sep 12, 2015)

Source code(tar.gz)
Source code(zip)
pandas_ml-0.2.0.tar.gz(41.68 KB)
v0.1.1(Mar 13, 2015)

Source code(tar.gz)
Source code(zip)
v0.1.0(Mar 7, 2015)

Source code(tar.gz)
Source code(zip)
v0.0.1(Mar 1, 2015)

Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository

Relevance Vector Machine implementation using the scikit-learn API.

scikit-rvm scikit-rvm is a Python module implementing the Relevance Vector Machine (RVM) machine learning technique using the scikit-learn API. Quicks

204 Nov 18, 2022

Turning images into '9-pan' palettes using KMeans clustering from sklearn.

img2palette Turning images into '9-pan' palettes using KMeans clustering from sklearn. Requirements We require: Pillow, for opening and processing ima

2 Jan 01, 2022

easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

5 Jun 18, 2022

A simple machine learning package to cluster keywords in higher-level groups.

Simple Keyword Clusterer A simple machine learning package to cluster keywords in higher-level groups. Example: "Senior Frontend Engineer" -- "Fronte

10 Dec 18, 2022

Python factor analysis library (PCA, CA, MCA, MFA, FAMD)

Prince is a library for doing factor analysis. This includes a variety of methods including principal component analysis (PCA) and correspondence anal

915 Dec 31, 2022

A naive Bayes model for cancer classification using a set of documents

Naivebayes text classifcation model for cancer and noncancer documents Author: Alex King Purpose Requirements/files included How to use 1. Purpose The

1 Nov 24, 2021

Forecasting prices using Facebook/Meta's Prophet model

CryptoForecasting using Machine and Deep learning (Part 1) CryptoForecasting using Machine Learning The main aspect of predicting the stock-related da

1 Nov 27, 2021

Ml based project which uses regression technique to predict the price.

Price-Predictor Ml based project which uses regression technique to predict the price. I have used various regression models and finds the model with

1 Jul 09, 2022

Case studies with Bayesian methods

8 Nov 26, 2022

Automated machine learning: Review of the state-of-the-art and opportunities for healthcare

42 Dec 23, 2022

A Software Framework for Neuromorphic Computing

338 Dec 26, 2022

MLFlow in a Dockercontainer based on Azurite and Postgres

mlflow-azurite-postgres docker This is a MLFLow image which works with a postgres DB and a local Azure Blob Storage Instance (Azurite). This image is

2 May 29, 2022

MIT-Machine Learning with Python–From Linear Models to Deep Learning

MIT-Machine Learning with Python–From Linear Models to Deep Learning | One of the 5 courses in MIT MicroMasters in Statistics & Data Science Welcome t

2 Aug 23, 2022

dirty_cat is a Python module for machine-learning on dirty categorical variables.

dirty_cat dirty_cat is a Python module for machine-learning on dirty categorical variables.

637 Dec 29, 2022

Summer: compartmental disease modelling in Python

Summer: compartmental disease modelling in Python Summer is a Python-based framework for the creation and execution of compartmental (or "state-based"

6 May 13, 2022

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and the A* Search (using the Manhattan Distance Heuristic)

17 Aug 14, 2022

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.

366 Jan 03, 2023

pandas, scikit-learn, xgboost and seaborn integration

Related tags

Overview

pandas-ml

Overview

Installation

Documentation

Example

Supported Packages

Comments

Releases(v0.6.1)

v0.6.1(Mar 5, 2019)

v0.6.0(Jan 15, 2019)

v0.5.0(Nov 16, 2017)

v0.4.0(Oct 15, 2016)

v0.3.0(Oct 22, 2015)

v0.2.0(Sep 12, 2015)

v0.1.1(Mar 13, 2015)

v0.1.0(Mar 7, 2015)

v0.0.1(Mar 1, 2015)

Owner

Relevance Vector Machine implementation using the scikit-learn API.

Turning images into '9-pan' palettes using KMeans clustering from sklearn.

easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

A simple machine learning package to cluster keywords in higher-level groups.

Python factor analysis library (PCA, CA, MCA, MFA, FAMD)

A naive Bayes model for cancer classification using a set of documents

Forecasting prices using Facebook/Meta's Prophet model

Ml based project which uses regression technique to predict the price.

Case studies with Bayesian methods

Automated machine learning: Review of the state-of-the-art and opportunities for healthcare

A Software Framework for Neuromorphic Computing

MLFlow in a Dockercontainer based on Azurite and Postgres

MIT-Machine Learning with Python–From Linear Models to Deep Learning

dirty_cat is a Python module for machine-learning on dirty categorical variables.

Summer: compartmental disease modelling in Python

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

GRaNDPapA: Generator of Rad Names from Decent Paper Acronyms

Convoys is a simple library that fits a few statistical model useful for modeling time-lagged conversions.

Laporan Proyek Machine Learning - Azhar Rizki Zulma

Microsoft contributing libraries, tools, recipes, sample codes and workshop contents for machine learning & deep learning.