Python histogram library - histograms as updateable, fully semantic objects with visualization tools. [P]ython [HYST]ograms.

Overview

physt Physt logo

P(i/y)thon h(i/y)stograms. Inspired (and based on) numpy.histogram, but designed for humans(TM) on steroids(TM).

The goal is to unify different concepts of histograms as occurring in numpy, pandas, matplotlib, ROOT, etc. and to create one representation that is easily manipulated with from the data point of view and at the same time provides nice integration into IPython notebook and various plotting options. In short, whatever you want to do with histograms, physt aims to be on your side.

Note: bokeh plotting backend has been discontinued (due to external library being redesigned.)

Travis ReadTheDocs Join the chat at https://gitter.im/physt/Lobby PyPI version Anaconda-Server Badge Anaconda-Server Badge

Versioning

  • Versions 0.3.x support Python 2.7 (no new releases in 2019)
  • Versions 0.4.x support Python 3.5+ while continuing the 0.3 API
  • Versions 0.4.9+ support only Python 3.6+ while continuing the 0.3 API
  • Versions 0.5.x slightly change the interpretation of *args in h1, h2, ...

Simple example

from physt import h1

# Create the sample
heights = [160, 155, 156, 198, 177, 168, 191, 183, 184, 179, 178, 172, 173, 175,
           172, 177, 176, 175, 174, 173, 174, 175, 177, 169, 168, 164, 175, 188,
           178, 174, 173, 181, 185, 166, 162, 163, 171, 165, 180, 189, 166, 163,
           172, 173, 174, 183, 184, 161, 162, 168, 169, 174, 176, 170, 169, 165]

hist = h1(heights, 10)           # <--- get the histogram data
hist << 190                      # <--- add a forgotten value
hist.plot()                      # <--- and plot it

Heights plot

2D example

from physt import h2
import seaborn as sns

iris = sns.load_dataset('iris')
iris_hist = h2(iris["sepal_length"], iris["sepal_width"], "human", bin_count=[12, 7], name="Iris")
iris_hist.plot(show_zero=False, cmap="gray_r", show_values=True);

Iris 2D plot

3D directional example

import numpy as np
from physt import special_histograms

# Generate some sample data
data = np.empty((1000, 3))
data[:,0] = np.random.normal(0, 1, 1000)
data[:,1] = np.random.normal(0, 1.3, 1000)
data[:,2] = np.random.normal(1, .6, 1000)

# Get histogram data (in spherical coordinates)
h = special_histograms.spherical(data)                 

# And plot its projection on a globe
h.projection("theta", "phi").plot.globe_map(density=True, figsize=(7, 7), cmap="rainbow")   

Directional 3D plot

See more in docstring's and notebooks:

Installation

Using pip:

pip install physt

Features

Implemented

  • 1D histograms
  • 2D histograms
  • ND histograms
  • Some special histograms
    • 2D polar coordinates (with plotting)
    • 3D spherical / cylindrical coordinates (beta)
  • Adaptive rebinning for on-line filling of unknown data (beta)
  • Non-consecutive bins
  • Memory-effective histogramming of dask arrays (beta)
  • Understands any numpy-array-like object
  • Keep underflow / overflow / missed bins
  • Basic numeric operations (* / + -)
  • Items / slice selection (including mask arrays)
  • Add new values (fill, fill_n)
  • Cumulative values, densities
  • Simple statistics for original data (mean, std, sem)
  • Plotting with several backends
    • matplotlib (static plots with many options)
    • vega (interactive plots, beta, help wanted!)
    • folium (experimental for geo-data)
    • plotly (very basic, help wanted!)
    • ascii (experimental)
  • Algorithms for optimized binning
    • human-friendly
    • mathematical
  • IO, conversions
    • I/O JSON
    • I/O xarray.DataSet (experimental)
    • O ROOT file (experimental)
    • O pandas.DataFrame (basic)

Planned

  • Rebinning
    • using reference to original data?
    • merging bins
  • Statistics (based on original data)?
  • Stacked histograms (with names)
  • Potentially holoviews plotting backend (instead of the discontinued bokeh one)

Not planned

  • Kernel density estimates - use your favourite statistics package (like seaborn)
  • Rebinning using interpolation - it should be trivial to use rebin (https://github.com/jhykes/rebin) with physt

Rationale (for both): physt is dumb, but precise.

Dependencies

  • Python 3.5+
  • numpy
  • (optional) matplotlib - simple output
  • (optional) xarray - I/O
  • (optional) protobuf - I/O
  • (optional) uproot - I/O
  • (optional) astropy - additional binning algorithms
  • (optional) folium - map plotting
  • (optional) vega3 - for vega in-line in IPython notebook (note that to generate vega JSON, this is not necessary)
  • (optional) asciiplotlib - for ASCII bar plots
  • (optional) xtermcolot - for ASCII color maps
  • (testing) py.test, pandas
  • (docs) sphinx, sphinx_rtd_theme, ipython

Publicity

Talk at PyData Berlin 2018:

Contribution

I am looking for anyone interested in using / developing physt. You can contribute by reporting errors, implementing missing features and suggest new one.

Thanks to:

Patches:

Alternatives and inspirations

Comments
  • python 2.7 plotting is not working

    python 2.7 plotting is not working

    When runnin plot() function I get the error below even though matplotlib is installed. Also the algorithm is pretty slow when running on something bigger than toy example.

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 137, in __call__
        return plot(self.histogram, kind=kind, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 91, in plot
        backend_name, backend = _get_backend(backend)
      File "/usr/local/lib/python2.7/dist-packages/physt/plotting/__init__.py", line 70, in _get_backend
        raise RuntimeError("No plotting backend available. Please, install matplotlib (preferred) or bokeh (limited).")
    RuntimeError: No plotting backend available. Please, install matplotlib (preferred) or bokeh (limited).
    
    bug 
    opened by romange 13
  • Smooth polar histograms?

    Smooth polar histograms?

    Thanks for writing this awesome library!

    I have a question regarding smoothing of polar 2D histograms. I am constructing a histogram like described on this page https://physt.readthedocs.io/en/latest/special_histograms.html#Polar-histogram and now I want to smooth it with a Gaussian kernel (like scipy.ndimage.gaussian_filter). What is the most elegant / correct method to do that?

    question 
    opened by horsto 7
  • Rebinning histograms related project

    Rebinning histograms related project

    Hi I found a project on rebinning histogram at https://github.com/jhykes/rebin and I opened an issue (jhykes/rebin#5) on that project page asking about integrating his code to this project. I hope you will appreciate it.

    enhancement idea? 
    opened by DancingQuanta 7
  • Option to center labels on bins

    Option to center labels on bins

    If you have a large dataset with a small number of values (such as consisting only of integers 1-10) then it would be nice to have the bin x-axis labels at the center under the respective bin instead of at the bin edges.

    I recognise this case is more of a 'histogram as bar plot' kind of thing, but it is a use-case I have often.

    opened by nzjrs 5
  • Usage of spherical histogram

    Usage of spherical histogram

    Hi, I have tried the example of spherical histogram. After a small modification of the code (normalized the data as unit vectors),

    n = 100 data = np.empty((n, 3)) data[:,0] = np.random.normal(0, 1, n) data[:,1] = np.random.normal(0, 1, n) data[:,2] = np.random.normal(0, 1, n) for i in range(n): scale = np.sqrt(data[i,0]**2 + data[i,1]**2 + data[i,2]**2) data[i,0] = data[i,0]/scale data[i,1] = data[i,1]/scale data[i,2] = data[i,2]/scale

    h = special.spherical_histogram(data, theta_bins=20, phi_bins=20) ax.scatter(data[:,0], data[:,1], data[:,2])

    globe = h.projection("theta", "phi") globe.plot.globe_map(density=True, figsize=(7, 7), cmap="rainbow")

    plt.show()

    I got an error: “RuntimeError: Bins not in rising order.” What did I do wrong? Thank you for your support.

    question 
    opened by zhengpuchen 3
  • approximate histograms

    approximate histograms

    I'm following the paper (http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf) implemented by https://github.com/carsonfarmer/streamhist, and the notion of approximate histograms seems elegant and efficient.

    After seeing the internals of streamhist (trying to fix bugs) and reading the paper, I can imagine ways to make a better implementation: e.g. much more efficient discovery of bins to be joined, and avoiding temporary lists when possible. Also the code seems overly complex, partially due to features like "bin freezing" which try to workaround poor bin joining performance.

    Anyway since streamhist is defunct, I'm thinking about trying an implementation. I wonder if this kind of histogram would fit into physt (and if sortedcollections would be reasonable as a dependency).

    opened by belm0 3
  • please make this library discoverable

    please make this library discoverable

    name: physt (?) github tag line: P(i/y)thon h(i/y)stograms (???)

    google search for "python streaming histogram"

    • top result is https://github.com/carsonfarmer/streamhist (unused / unmaintained)
    • physt not in initial 10 pages of results...

    For over a year I've wanted to find a Python library which supports efficient histogram updates without a bunch of ugly dependencies. I've searched many times. Today I happened to get lucky by seeing physt mentioned at the bottom of a SO question (https://stackoverflow.com/questions/40627274/).

    To improve discoverability by search, please consider updating the github tag line to concisely and accurately describe the library (... rather than be cute).

    opened by belm0 2
  • Warning in current numpy

    Warning in current numpy

    If you try to merge bins:

    from physt import h2
    from scipy.stats import multivariate_normal
    hist = h2(*multivariate_normal.rvs((0,0), size=100_000).T, bins=100)
    hist.merge_bins(2)
    

    You get a warning from numpy:

    /home/schreihf/.local/lib/python3.7/site-packages/physt/histogram_base.py:572: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
      new_frequencies[new_index] += old_frequencies[old_index]
    /home/schreihf/.local/lib/python3.7/site-packages/physt/histogram_base.py:573: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
      new_errors2[new_index] += old_errors2[old_index]
    
    opened by henryiii 2
  • Add 2D & ND histograms

    Add 2D & ND histograms

    • [x] Analogous data model to Histogram1D
    • [x] refactor HistogramBase class -> common behaviour of 1D and 2D
    • [x] revisit binning schemas
    • [x] histogram2D facade function to be compatible with numpy one
    • [x] plotting
    • [x] arithmetic operations
    • [x] documentation
    • [ ] stats
    enhancement 
    opened by janpipek 2
  • ImportError with newer plotly

    ImportError with newer plotly

    [SOMEDIR}\physt\physt\plotting\plotly.py in <module>
         12 
         13 import plotly.offline as pyo
    ---> 14 import plotly.plotly as pyp
         15 import plotly.graph_objs as go
         16 
    
    ~\Miniconda3\lib\site-packages\plotly\plotly\__init__.py in <module>
          2 from _plotly_future_ import _chart_studio_error
          3 
    ----> 4 _chart_studio_error("plotly")
    
    ~\Miniconda3\lib\site-packages\_plotly_future_\__init__.py in _chart_studio_error(submodule)
         41 
         42 def _chart_studio_error(submodule):
    ---> 43     raise ImportError(
         44         """
         45 The plotly.{submodule} module is deprecated,
    
    ImportError: 
    The plotly.plotly module is deprecated,
    please install the chart-studio package and use the
    chart_studio.plotly module instead. 
    
    bug visualization 
    opened by janpipek 1
  • Wrong bars center in polar_map

    Wrong bars center in polar_map

    I have found that the bars in polar_map are centered on the left edge of the phi bins instead of their center. Because of this, the representation of the histogram does not coincide with the data, as in the figure below: polarmap_wrong

    I think this can be easily solved by replacing

    bars = ax.bar(phipos[i], dr[i], width=dphi[i], bottom=rpos[i], color=bin_color,

    with

    bars = ax.bar(phipos[i] + 0.5*dphi[i], dr[i], width=dphi[i], bottom=rpos[i], color=bin_color,

    in the definition of polar_map.

    By the way, thank you for this amazing package!

    bug visualization 
    opened by ruhugu 1
  • Be more explicit about bins too narrow for float representation

    Be more explicit about bins too narrow for float representation

    If the computed range for the binning divided by the number of bins is lower than the minimum float difference at the scale, we receive an error [ValueError: Bins not in rising order.] which is not very informative.

    To reproduce:

    data = [1, np.nextafter(1, 2)]
    physt.h1(data)
    

    It also happens when the range is 0, like in:

    data = [1, 1]
    physt.h1(data)
    
    enhancement 
    opened by janpipek 1
Releases(v0.5.2)
Owner
Jan Pipek
PyData Prague
Jan Pipek
🌀❄️🌩️ This repository contains some examples for creating 2d and 3d weather plots using matplotlib and cartopy libraries in python3.

Weather-Plotting 🌀 ❄️ 🌩️ This repository contains some examples for creating 2d and 3d weather plots using matplotlib and cartopy libraries in pytho

Giannis Dravilas 21 Dec 10, 2022
Generate knowledge graphs with interesting geometries, like lattices

Geometric Graphs Generate knowledge graphs with interesting geometries, like lattices. Works on Python 3.9+ because it uses cool new features. Get out

Charles Tapley Hoyt 5 Jan 03, 2022
Simple and lightweight Spotify Overlay written in Python.

Simple Spotify Overlay This is a simple yet powerful Spotify Overlay. About I have been looking for something like this ever since I got Spotify. I th

27 Sep 03, 2022
Data aggregated from the reports found at the MCPS COVID Dashboard into a set of visualizations.

Montgomery County Public Schools COVID-19 Visualizer Contents About this project Data Support this project About this project Data All data we use can

James 3 Jan 19, 2022
D-Analyst : High Performance Visualization Tool

D-Analyst : High Performance Visualization Tool D-Analyst is a high performance data visualization built with python and based on OpenGL. It allows to

4 Apr 14, 2022
Visualizations of some specific solutions of different differential equations.

Diff_sims Visualizations of some specific solutions of different differential equations. Heat Equation in 1 Dimension (A very beautiful and elegant ex

2 Jan 13, 2022
finds grocery stores and stuff next to route (gpx)

Route-Report Route report is a command-line utility that can be used to locate points-of-interest near your planned route (gpx). The results are based

Clemens Mosig 5 Oct 10, 2022
Plot toolbox based on Matplotlib, simple and elegant.

Elegant-Plot Plot toolbox based on Matplotlib, simple and elegant. 绘制效果 绘制过程 数据准备 每种图标类型的目录下有data.csv文件,依据样例数据填入自己的数据。

3 Jul 15, 2022
Open-questions - Open questions for Bellingcat technical contributors

Open questions for Bellingcat technical contributors These are difficult, long-term projects that would contribute to open source investigations at Be

Bellingcat 234 Dec 31, 2022
Domain Connectivity Analysis Tools to analyze aggregate connectivity patterns across a set of domains during security investigations

DomainCAT (Domain Connectivity Analysis Tool) Domain Connectivity Analysis Tool is used to analyze aggregate connectivity patterns across a set of dom

DomainTools 34 Dec 09, 2022
Draw interactive NetworkX graphs with Altair

nx_altair Draw NetworkX graphs with Altair nx_altair offers a similar draw API to NetworkX but returns Altair Charts instead. If you'd like to contrib

Zachary Sailer 206 Dec 12, 2022
又一个云探针

ServerStatus-Murasame 感谢ServerStatus-Hotaru,又一个云探针诞生了(大雾 本项目在ServerStatus-Hotaru的基础上使用fastapi重构了服务端,部分修改了客户端与前端 项目还在非常原始的阶段,可能存在严重的问题 演示站:https://stat

6 Oct 19, 2021
ipyvizzu - Jupyter notebook integration of Vizzu

ipyvizzu - Jupyter notebook integration of Vizzu. Tutorial · Examples · Repository About The Project ipyvizzu is the Jupyter Notebook integration of V

Vizzu 729 Jan 08, 2023
Here I plotted data for the average test scores across schools and class sizes across school districts.

HW_02 Here I plotted data for the average test scores across schools and class sizes across school districts. Average Test Score by Race This graph re

7 Oct 27, 2021
eoplatform is a Python package that aims to simplify Remote Sensing Earth Observation by providing actionable information on a wide swath of RS platforms and provide a simple API for downloading and visualizing RS imagery

An Earth Observation Platform Earth Observation made easy. Report Bug | Request Feature About eoplatform is a Python package that aims to simplify Rem

Matthew Tralka 4 Aug 11, 2022
A guide for using Bootstrap 5 classes in Dash Bootstrap Components V1

dash-bootstrap-cheatsheet This handy interactive cheatsheet makes it easy to use the Bootstrap 5 classes with your Dash app made with the latest versi

10 Dec 22, 2022
Simple, realtime visualization of neural network training performance.

pastalog Simple, realtime visualization server for training neural networks. Use with Lasagne, Keras, Tensorflow, Torch, Theano, and basically everyth

Rewon Child 416 Dec 29, 2022
A GUI for Pandas DataFrames

PandasGUI A GUI for analyzing Pandas DataFrames. Demo Installation Install latest release from PyPi: pip install pandasgui Install directly from Githu

Adam 2.8k Jan 03, 2023
Geocoding library for Python.

geopy geopy is a Python client for several popular geocoding web services. geopy makes it easy for Python developers to locate the coordinates of addr

geopy 3.8k Jan 02, 2023
A set of three functions, useful in geographical calculations of different sorts

GreatCircle A set of three functions, useful in geographical calculations of different sorts. Available for PHP, Python, Javascript and Ruby. Live dem

72 Sep 30, 2022