Skip to content

Compute histograms from XArray data using BoostHistogram

License

Notifications You must be signed in to change notification settings

Descanonge/xarray-histogram

Repository files navigation

XArray-Histogram

Compute and manipulate histograms from XArray data using BoostHistogram

This package allow to compute histograms from XArray data, taking advantage of its label and dimensions management. It relies on the Boost Histogram library for the computations.

It is essentially a thin wrapper using directly Boost Histogram on loaded data, or Dask-histogram on data contained in Dask arrays. It thus features optimized performance, as well as lazy computation and easy up-scaling thanks to Dask.

Quick examples

Bins can be specified similarly to Numpy functions:

import xarray_histogram as xh
hist = xh.histogram(data, bins=[(100, 0., 10.)])

but also using boost axes, benefiting from their features:

import boost_histogram.axis as bha
hist = xh.histogram(data, bins=[bha.Regular(100, 0., 10.)])

Multi-dimensional histogram can be computed, here in 2D for instance:

hist = xh.histogram(
    temp, chlorophyll,
    bins=[bha.Regular(100, -5., 40.), bha.Regular(100, 1e-3, 10, transform=bha.transform.log))
)

Finally, so far we have computed histograms on the whole flattened arrays, but we can compute only along some dimensions. For instance we can retrieve the time evolution of an histogram:

hist = xh.histogram(temp, bins=[bha.Regular(100, 0., 10.)], dims=['lat', 'lon'])

Histograms can be normalized, and weights can be applied. All of this works seamlessly with data stored in numpy or dask arrays.

Accessor

An Xarray accessor can be made available to do certain manipulations on histogram data. Simply import xarray_histogram.accessor, all arrays can then access methods through the hist property::

hist.hist.edges()
hist.hist.median()
hist.hist.ppf(q=0.75)

See the accessor API for more details.

Requirements

Installation

From source with:

git clone https://github.com/Descanonge/xarray-histogram
cd xarray-histogram
pypi install -e .

Soon on Pypi.

Documentation

Documentation available at https://xarray-histogram.readthedocs.io

Installation

Soon from PyPI ... 🚧

From source:

git clone https://github.com/Descanonge/xarray-histogram
cd xarray-histogram
pypi install -e .

Tests and performance

To compare performances check these notebooks for numpy and dask arrays.

Other packages

xhistogram already exists. It relies on Numpy functions and thus does not benefit of some performance upgrades brought by Boost (see performance comparisons). I also hoped to bring similar features with simpler code, relying on dependencies. Some additional features of boost (overflow bins, rebinning, extracting various statistics from the DataArray histogram) could be added (this is in the works).

About

Compute histograms from XArray data using BoostHistogram

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published