A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Last update: Dec 29, 2022

Overview

Documentation | External Resources | Research Paper

Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble.

The library consists of various methods to compute (approximate) the Shapley value of players (models) in weighted voting games (ensemble games) - a class of transferable utility cooperative games. We covered the exact enumeration based computation and various widely know approximation methods from economics and computer science research papers. There are also functionalities to identify the heterogeneity of the player pool based on the Shapley entropy. In addition, the framework comes with a detailed documentation, an intuitive tutorial, 100% test coverage and illustrative toy examples.

Citing

If you find Shapley useful in your research please consider adding the following citation:

@misc{rozemberczki2021shapley,
      title = {{The Shapley Value of Classifiers in Ensemble Games}}, 
      author = {Benedek Rozemberczki and Rik Sarkar},
      year = {2021},
      eprint = {2101.02153},
      archivePrefix = {arXiv},
      primaryClass = {cs.LG}
}

A simple example

Shapley makes solving voting games quite easy - see the accompanying tutorial. For example, this is all it takes to solve a weighted voting game with defined on the fly with permutation sampling:

import numpy as np
from shapley import PermutationSampler

W = np.random.uniform(0, 1, (1, 7))
W = W/W.sum()
q = 0.5

solver = PermutationSampler()
solver.solve_game(W, q)
shapley_values = solver.get_solution()

Methods Included

In detail, the following methods can be used.

Expected Marginal Contribution Approximation from Fatima et al.: A Linear Approximation Method for the Shapley Value
Multilinear Extension from Owen: Multilinear Extensions of Games
Monte Carlo Permutation Sampling from Maleki et al.: Bounding the Estimation Error of Sampling-based Shapley Value Approximation
Exact Enumeration from Shapley: A Value for N-Person Games

Head over to our documentation to find out more about installation, creation of datasets and a full list of implemented methods and available datasets. For a quick start, check out the examples in the examples/ directory.

If you notice anything unexpected, please open an issue. If you are missing a specific method, feel free to open a feature request.

Installation

$ pip install shapley

Running tests

$ python setup.py test

Running examples

$ cd examples
$ python permutation_sampler_example.py

License

MIT License

You might also like...

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

About subwAI subwAI - a project for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation

82 Jan 1, 2023

Comments

Error in running MLE example

Thank you for sharing your great work. I truly enjoyed reading it. However, I met an error when I tried the example. It seems to be fine for the MC example.

$ python multilinear_extension_example.py RuntimeWarning: invalid value encountered in true_divide self._Phi = self._Phi / np.sum(self._Phi, axis=1).reshape(-1, 1) Traceback (most recent call last): File "multilinear_extension_example.py", line 11, in solver.solve_game(W, q) File "/lib/python3.6/site-packages/shapley/solvers/multilinear_extension.py", line 34, in solve_game self._run_sanity_check(W, self._Phi) File "/lib/python3.6/site-packages/shapley/solution_concept.py", line 28, in _run_sanity_check self._verify_distribution(Phi) File "/lib/python3.6/site-packages/shapley/solution_concept.py", line 22, in _verify_distribution assert np.sum(Phi) - Phi.shape[0] < 0.001 AssertionError

opened by xxlya 2

Releases(v_10003)

v_10003(Apr 28, 2022)
Moves the Shapley library to an ABC based design.

Adds a version attribute.

Source code(tar.gz)
Source code(zip)
v_10002(May 16, 2021)

Source code(tar.gz)
Source code(zip)
v_10001(Feb 1, 2021)
Fixed the expectations and variances.

Source code(tar.gz)
Source code(zip)
v_10000(Dec 31, 2020)

The official first release of Shapley.
Source code(tar.gz)
Source code(zip)

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Related tags

Overview

You might also like...

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

The Python ensemble sampling toolkit for affine-invariant MCMC

Neural Ensemble Search for Performant and Calibrated Predictions

An Ensemble of CNN (Python 3.5.1 Tensorflow 1.3 numpy 1.13)

zeus is a Python implementation of the Ensemble Slice Sampling method.

Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Using Hotel Data to predict High Value And Potential VIP Guests

A Simple Key-Value Data-store written in Python

Comments

Error in running MLE example

Releases(v_10003)

v_10003(Apr 28, 2022)

v_10002(May 16, 2021)

v_10001(Feb 1, 2021)

v_10000(Dec 31, 2020)

Owner

Benedek Rozemberczki

CCNet: Criss-Cross Attention for Semantic Segmentation (TPAMI 2020 & ICCV 2019).

OpenDelta - An Open-Source Framework for Paramter Efficient Tuning.

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images

A PyTorch port of the Neural 3D Mesh Renderer

Link prediction using Multiple Order Local Information (MOLI)

Discord-Protect is a simple discord bot allowing you to have some security on your discord server by ordering a captcha to the user who joins your server.

AutoPentest-DRL: Automated Penetration Testing Using Deep Reinforcement Learning

Continual learning with sketched Jacobian approximations

Official Codes for Graph Modularity:Towards Understanding the Cross-Layer Transition of Feature Representations in Deep Neural Networks.

This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

Machine Learning in Asset Management (by @firmai)

Implementation of U-Net and SegNet for building segmentation

This repository contains the code for our paper VDA (public in EMNLP2021 main conference)

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

Unpaired Caricature Generation with Multiple Exaggerations

MacroTools provides a library of tools for working with Julia code and expressions.

Fast mesh denoising with data driven normal filtering using deep variational autoencoders

Codes for "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation"

A multilingual version of MS MARCO passage ranking dataset

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.