diff --git a/README.rst b/README.rst index 39d6db37..67273dcc 100644 --- a/README.rst +++ b/README.rst @@ -1,23 +1,9 @@ hdf5plugin ========== -This module provides HDF5 compression filters (namely: blosc, bitshuffle, lz4, FCIDECOMP, ZFP, zstd) and registers them to the HDF5 library used by `h5py `_. +*hdf5plugin* provides HDF5 compression filters (namely: blosc, bitshuffle, lz4, FCIDECOMP, ZFP, zstd) and makes them usable from `h5py `_. -* Supported operating systems: Linux, Windows, macOS. -* Supported versions of Python: >= 3.4 - -`hdf5plugin` provides a generic way to enable the use of the provided HDF5 compression filters with `h5py` that can be installed via `pip` or `conda`. - -Alternatives to install HDF5 compression filters are: system-wide installation on Linux or other conda packages: `blosc-hdf5-plugin `_, `hdf5-lz4 `_. - -The HDF5 plugin sources were obtained from: - -* LZ4 plugin (v0.1.0) and lz4 (v1.3.0, tag r122): https://github.com/nexusformat/HDF5-External-Filter-Plugins, https://github.com/lz4/lz4 -* bitshuffle plugin (0.3.5): https://github.com/kiyo-masui/bitshuffle -* hdf5-blosc plugin (v1.0.0), c-blosc (v1.20.1) and snappy (v1.1.1): https://github.com/Blosc/hdf5-blosc, https://github.com/Blosc/c-blosc and https://github.com/Blosc/c-blosc/tree/v1.17.0/internal-complibs/snappy-1.1.1 -* FCIDECOMP plugin (v1.0.2) and CharLS (branch 1.x-master SHA1 ID:25160a42fb62e71e4b0ce081f5cb3f8bb73938b5): ftp://ftp.eumetsat.int/pub/OPS/out/test-data/Test-data-for-External-Users/MTG_FCI_Test-Data/FCI_Decompression_Software_V1.0.2/ and https://github.com/team-charls/charls.git -* HDF5-ZFP plugin (v1.0.1) and zfp (v0.5.5): https://github.com/LLNL/H5Z-ZFP and https://github.com/LLNL/zfp -* HDF5Plugin-Zstandard (commit d5afdb5) and zstd (v1.4.5): https://github.com/aparamon/HDF5Plugin-Zstandard and https://github.com/Blosc/c-blosc/tree/v1.20.1/internal-complibs/zstd-1.4.5 +See `documentation `_. Installation ------------ @@ -36,254 +22,19 @@ To install from source and recompile the HDF5 plugins, run:: Installing from source can achieve better performances by enabling AVX2 and OpenMP if available. -Documentation -------------- - -To use it, just use ``import hdf5plugin`` and supported compression filters are available from `h5py `_. - -Sample code: - -.. code-block:: python - - import numpy - import h5py - import hdf5plugin - - # Compression - f = h5py.File('test.h5', 'w') - f.create_dataset('data', data=numpy.arange(100), **hdf5plugin.LZ4()) - f.close() - - # Decompression - f = h5py.File('test.h5', 'r') - data = f['data'][()] - f.close() - -``hdf5plugin`` provides: - -* Compression option helper classes to prepare arguments to provide to ``h5py.Group.create_dataset``: - - - `Bitshuffle(nelems=0, lz4=True)`_ - - `Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE)`_ - - `FciDecomp()`_ - - `LZ4(nbytes=0)`_ - - `Zfp()`_ - - -* The HDF5 filter ID of embedded plugins: - - - ``BLOSC_ID`` - - ``BSHUF_ID`` - - ``FCIDECOMP_ID`` - - ``LZ4_ID`` - - ``ZFP_ID`` - - ``ZSTD_ID`` - -* ``FILTERS``: A dictionary mapping provided filters to their ID -* ``PLUGINS_PATH``: The directory where the provided filters library are stored. - -Bitshuffle(nelems=0, lz4=True) -****************************** - -This class takes the following arguments and returns the compression options to feed into ``h5py.Group.create_dataset`` for using the bitshuffle filter: - -* **nelems** the number of elements per block, needs to be divisible by eight (default is 0, about 8kB per block) -* **lz4** if True the elements get compressed using lz4 (default is True) - -It can be passed as keyword arguments. - -Sample code: - -.. code-block:: python - - f = h5py.File('test.h5', 'w') - f.create_dataset('bitshuffle_with_lz4', data=numpy.arange(100), - **hdf5plugin.Bitshuffle(nelems=0, lz4=True)) - f.close() - -Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE) -********************************************* - -This class takes the following arguments and returns the compression options to feed into ``h5py.Group.create_dataset`` for using the blosc filter: - -* **cname** the compression algorithm, one of: - - * 'blosclz' - * 'lz4' (default) - * 'lz4hc' - * 'snappy' (optional, requires C++11) - * 'zlib' - * 'zstd' - -* **clevel** the compression level, from 0 to 9 (default is 5) -* **shuffle** the shuffling mode, in: - - * `Blosc.NOSHUFFLE` (0): No shuffle - * `Blosc.SHUFFLE` (1): byte-wise shuffle (default) - * `Blosc.BITSHUFFLE` (2): bit-wise shuffle - -It can be passed as keyword arguments. - -Sample code: - -.. code-block:: python - - f = h5py.File('test.h5', 'w') - f.create_dataset('blosc_byte_shuffle_blosclz', data=numpy.arange(100), - **hdf5plugin.Blosc(cname='blosclz', clevel=9, shuffle=hdf5plugin.Blosc.SHUFFLE)) - f.close() - -FciDecomp() -*********** - -This class returns the compression options to feed into ``h5py.Group.create_dataset`` for using the FciDecomp filter: - -It can be passed as keyword arguments. - -Sample code: - -.. code-block:: python - - f = h5py.File('test.h5', 'w') - f.create_dataset('fcidecomp', data=numpy.arange(100), - **hdf5plugin.FciDecomp()) - f.close() - -LZ4(nbytes=0) -************* - -This class takes the number of bytes per block as argument and returns the compression options to feed into ``h5py.Group.create_dataset`` for using the lz4 filter: - -* **nbytes** number of bytes per block needs to be in the range of 0 < nbytes < 2113929216 (1,9GB). - The default value is 0 (for 1GB). - -It can be passed as keyword arguments. +For more details, see the `installation documentation `_. -Sample code: +How-to use +---------- -.. code-block:: python - - f = h5py.File('test.h5', 'w') - f.create_dataset('lz4', data=numpy.arange(100), - **hdf5plugin.LZ4(nbytes=0)) - f.close() - -Zfp() -***** - -This class returns the compression options to feed into ``h5py.Group.create_dataset`` for using the zfp filter: - -It can be passed as keyword arguments. - -Sample code: - -.. code-block:: python - - f = h5py.File('test.h5', 'w') - f.create_dataset('zfp', data=numpy.random.random(100), - **hdf5plugin.Zfp()) - f.close() - -The zfp filter compression mode is defined by the provided arguments. -The following compression modes are supported: - -- **Fixed-rate** mode: - For details, see `zfp fixed-rate mode `_. - - .. code-block:: python - - f.create_dataset('zfp_fixed_rate', data=numpy.random.random(100), - **hdf5plugin.Zfp(rate=10.0)) - -- **Fixed-precision** mode: - For details, see `zfp fixed-precision mode `_. - - .. code-block:: python - - f.create_dataset('zfp_fixed_precision', data=numpy.random.random(100), - **hdf5plugin.Zfp(precision=10)) - -- **Fixed-accuracy** mode: - For details, see `zfp fixed-accuracy mode `_. - - .. code-block:: python - - f.create_dataset('zfp_fixed_accuracy', data=numpy.random.random(100), - **hdf5plugin.Zfp(accuracy=0.001)) - -- **Reversible** (i.e., lossless) mode: - For details, see `zfp reversible mode `_. - - .. code-block:: python - - f.create_dataset('zfp_reversible', data=numpy.random.random(100), - **hdf5plugin.Zfp(reversible=True)) - -- **Expert** mode: - For details, see `zfp expert mode `_. - - .. code-block:: python - - f.create_dataset('zfp_expert', data=numpy.random.random(100), - **hdf5plugin.Zfp(minbits=1, maxbits=16657, maxprec=64, minexp=-1074)) - -Zstd() -****** - -This class returns the compression options to feed into ``h5py.Group.create_dataset`` for using the Zstd filter: - -It can be passed as keyword arguments. - -Sample code: - -.. code-block:: python - - f = h5py.File('test.h5', 'w') - f.create_dataset('zstd', data=numpy.arange(100), - **hdf5plugin.Zstd()) - f.close() - - -Dependencies ------------- - -* `h5py `_ - -Testing -------- - -To run self-contained tests, from Python: - -.. code-block:: python - - import hdf5plugin.test - hdf5plugin.test.run_tests() - -Or, from the command line:: - - python -m hdf5plugin.test - -To also run tests relying on actual HDF5 files, run from the source directory:: - - python test/test.py +To use it, just use ``import hdf5plugin`` and supported compression filters are available from `h5py `_. -This tests the installed version of `hdf5plugin`. +For details, see `Usage documentation `_. License ------- -The source code of *hdf5plugin* itself is licensed under the MIT license. -Use it at your own risk. -See `LICENSE `_ - -The source code of the embedded HDF5 filter plugin libraries is licensed under different open-source licenses. -Please read the different licenses: - -* bitshuffle: See `src/bitshuffle/LICENSE `_ -* blosc: See `src/hdf5-blosc/LICENSES/ `_ and `src/c-blosc/LICENSES/ `_ -* lz4: See `src/LZ4/COPYING `_ and `src/lz4-r122/LICENSE `_ -* FCIDECOMP: See `src/fcidecomp/LICENSE `_ and `src/charls/src/License.txt `_ -* zfp: See `src/H5Z-ZFP/LICENSE `_ and `src/zfp/LICENSE `_ -* zstd: See `src/HDF5Plugin-Zstandard/LICENSE` +The source code of *hdf5plugin* itself is licensed under the `MIT license `_. -The HDF5 v1.10.5 headers (and Windows .lib file) used to build the filters are stored for convenience in the repository. The license is available here: `src/hdf5/COPYING `_. +Embedded HDF5 compression filters are licensed under different open-source licenses: +see the `license documentation `_. diff --git a/doc/Makefile b/doc/Makefile new file mode 100644 index 00000000..d4bb2cbb --- /dev/null +++ b/doc/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = . +BUILDDIR = _build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/doc/changelog.rst b/doc/changelog.rst new file mode 100644 index 00000000..010444b3 --- /dev/null +++ b/doc/changelog.rst @@ -0,0 +1,5 @@ +=========== + Changelog +=========== + +.. include:: ../CHANGELOG.rst \ No newline at end of file diff --git a/doc/compression_opts.rst b/doc/compression_opts.rst deleted file mode 100644 index 4c01aee9..00000000 --- a/doc/compression_opts.rst +++ /dev/null @@ -1,88 +0,0 @@ -===================== - Compression options -===================== - -Compression filters can be configured with the ``compression_opts`` argument of `h5py.Group.create_dataset `_ method by providing a tuple of integers. - -The meaning of those integers is filter dependent and is described below. - -bitshuffle -.......... - -compression_opts: (**block_size**, **lz4 compression**) - -- **block size**: Number of elements (not bytes) per block. - It MUST be a mulitple of 8. - Default: 0 for a block size of about 8 kB. -- **lz4 compression**: 0: disabled (default), 2: enabled. - -By default the filter uses bitshuffle, but does NOT compress with LZ4. - -blosc -..... - -compression_opts: (0, 0, 0, 0, **compression level**, **shuffle**, **compression**) - -- First 4 values are reserved. -- **compression level**: - From 0 (no compression) to 9 (maximum compression). - Default: 5. -- **shuffle**: Shuffle filter: - - * 0: no shuffle - * 1: byte shuffle - * 2: bit shuffle - -- **compression**: The compressor blosc ID: - - * 0: blosclz (default) - * 1: lz4 - * 2: lz4hc - * 3: snappy - * 4: zlib - * 5: zstd - -By default the filter uses byte shuffle and blosclz. - -lz4 -... - -compression_opts: (**block_size**,) - -- **block size**: Number of bytes per block. - Default 0 for a block size of 1GB. - It MUST be < 1.9 GB. - -zfp -... - -For more information, see `zfp modes `_ and `hdf5-zfp generic interface `_. - -The first value of *compression_opts* is **mode**. -The following values depends on the value of **mode**: - -- *Fixed-rate* mode: (1, 0, **rateHigh**, **rateLow**, 0, 0) - Rate, i.e., number of compressed bits per value, as a double stored as: - - - **rateHigh**: High 32-bit word of the rate double. - - **rateLow**: Low 32-bit word of the rate double. - -- *Fixed-precision* mode: (2, 0, **prec**, 0, 0, 0) - - - **prec**: Number of uncompressed bits per value. - -- *Fixed-accuracy* mode: (3, 0, **accHigh**, **accLow**, 0, 0) - Accuracy, i.e., absolute error tolerance, as a double stored as: - - - **accHigh**: High 32-bit word of the accuracy double. - - **accLow**: Low 32-bit word of the accuracy double. - -- *Expert* mode: (4, 0, **minbits**, **maxbits**, **maxprec**, **minexp**) - - - **minbits**: Minimum number of compressed bits used to represent a block. - - **maxbits**: Maximum number of bits used to represent a block. - - **maxprec**: Maximum number of bit planes encoded. - - **minexp**: Smallest absolute bit plane number encoded. - -- *Reversible* mode: (5, 0, 0, 0, 0, 0) - diff --git a/doc/conf.py b/doc/conf.py new file mode 100644 index 00000000..9a925c72 --- /dev/null +++ b/doc/conf.py @@ -0,0 +1,52 @@ +# Configuration file for the Sphinx documentation builder. +# +# This file only contains a selection of the most common options. For a full +# list see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +# import os +# import sys +# sys.path.insert(0, os.path.abspath('.')) + +import os + +# See https://docs.readthedocs.io/en/stable/builds.html#build-environment +on_rtd = os.environ.get('READTHEDOCS') == 'True' + + +# -- Project information ----------------------------------------------------- + +project = 'hdf5plugin' +copyright = u'2016-2021, Data analysis unit, European Synchrotron Radiation Facility, Grenoble' +author = 'ESRF - Data Analysis Unit' + +# -- General configuration --------------------------------------------------- + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [ + 'sphinx.ext.autodoc', +] + +if not on_rtd: + extensions.append('sphinx_rtd_theme') + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] + + +# -- Options for HTML output ------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +html_theme = 'default' if on_rtd else 'sphinx_rtd_theme' diff --git a/doc/contribute.rst b/doc/contribute.rst index d29396c0..b06e704d 100644 --- a/doc/contribute.rst +++ b/doc/contribute.rst @@ -1,9 +1,39 @@ -============== - Contributing -============== +============ + Contribute +============ This project follows the standard open-source project github workflow, which is described in other projects like `matplotlib `_ or `scikit-image `_. +Testing +======= + +To run self-contained tests, from Python: + +.. code-block:: python + + import hdf5plugin.test + hdf5plugin.test.run_tests() + +Or, from the command line:: + + python -m hdf5plugin.test + +To also run tests relying on actual HDF5 files, run from the source directory:: + + python test/test.py + +This tests the installed version of `hdf5plugin`. + +Building documentation +====================== + +Documentation relies on `Sphinx `_. + +To build documentation, run from the project root directory:: + + python setup.py build + PYTHONPATH=build/lib.--/ sphinx-build -b html doc/ build/html + Guidelines to add a compression filter ====================================== @@ -37,12 +67,100 @@ This briefly describes the steps to add a HDF5 compression filter to the zoo. - In ``test/test.py`` for testing reading a compressed file that was produced with another software. - In ``src/hdf5plugin/test.py`` for tests that writes data using the compression filter and the compression options helper function and reads back the data. -* Update the ``README.rst`` file to document: +* Update the ``doc/information.rst`` file to document: - The version of the HDF5 filter that is embedded in ``hdf5plugin``. - The license of the filter (by adding a link to the license file). - - The ``hdf5plugin.`` filter ID "CONSTANT". - - The ``hdf5plugin._options`` compression helper function. -* Update ``doc/compression_opts.rst`` to document the format of ``compression_opts`` expected by the filter. +* Update the ``doc/usage.rst`` file to document: + + - The ``hdf5plugin.`` compression argument helper class. + +* Update ``doc/contribute.rst`` to document the format of ``compression_opts`` expected by the filter (see `Compression filters can be configured with the ``compression_opts`` argument of `h5py.Group.create_dataset `_ method by providing a tuple of integers. + +Low-level compression filter arguments +====================================== + +Compression filters can be configured with the ``compression_opts`` argument of `h5py.Group.create_dataset `_ method by providing a tuple of integers. + +The meaning of those integers is filter dependent and is described below. + +bitshuffle +.......... + +compression_opts: (**block_size**, **lz4 compression**) + +- **block size**: Number of elements (not bytes) per block. + It MUST be a mulitple of 8. + Default: 0 for a block size of about 8 kB. +- **lz4 compression**: 0: disabled (default), 2: enabled. + +By default the filter uses bitshuffle, but does NOT compress with LZ4. + +blosc +..... + +compression_opts: (0, 0, 0, 0, **compression level**, **shuffle**, **compression**) + +- First 4 values are reserved. +- **compression level**: + From 0 (no compression) to 9 (maximum compression). + Default: 5. +- **shuffle**: Shuffle filter: + + * 0: no shuffle + * 1: byte shuffle + * 2: bit shuffle + +- **compression**: The compressor blosc ID: + + * 0: blosclz (default) + * 1: lz4 + * 2: lz4hc + * 3: snappy + * 4: zlib + * 5: zstd + +By default the filter uses byte shuffle and blosclz. + +lz4 +... + +compression_opts: (**block_size**,) + +- **block size**: Number of bytes per block. + Default 0 for a block size of 1GB. + It MUST be < 1.9 GB. + +zfp +... + +For more information, see `zfp modes `_ and `hdf5-zfp generic interface `_. + +The first value of *compression_opts* is **mode**. +The following values depends on the value of **mode**: + +- *Fixed-rate* mode: (1, 0, **rateHigh**, **rateLow**, 0, 0) + Rate, i.e., number of compressed bits per value, as a double stored as: + + - **rateHigh**: High 32-bit word of the rate double. + - **rateLow**: Low 32-bit word of the rate double. + +- *Fixed-precision* mode: (2, 0, **prec**, 0, 0, 0) + + - **prec**: Number of uncompressed bits per value. + +- *Fixed-accuracy* mode: (3, 0, **accHigh**, **accLow**, 0, 0) + Accuracy, i.e., absolute error tolerance, as a double stored as: + + - **accHigh**: High 32-bit word of the accuracy double. + - **accLow**: Low 32-bit word of the accuracy double. + +- *Expert* mode: (4, 0, **minbits**, **maxbits**, **maxprec**, **minexp**) + + - **minbits**: Minimum number of compressed bits used to represent a block. + - **maxbits**: Maximum number of bits used to represent a block. + - **maxprec**: Maximum number of bit planes encoded. + - **minexp**: Smallest absolute bit plane number encoded. +- *Reversible* mode: (5, 0, 0, 0, 0, 0) diff --git a/doc/index.rst b/doc/index.rst new file mode 100644 index 00000000..1783024a --- /dev/null +++ b/doc/index.rst @@ -0,0 +1,41 @@ +hdf5plugin +========== + +*hdf5plugin* provides HDF5 compression filters (namely: blosc, bitshuffle, lz4, FCIDECOMP, ZFP, zstd) and makes them usable from `h5py `_. + +* Supported operating systems: Linux, Windows, macOS. +* Supported versions of Python: >= 3.4 +* Supported architectures: All. + Specific optimizations are available for *x86* family and *ppc64le*. + +*hdf5plugin* provides a generic way to enable the use of the provided HDF5 compression filters with `h5py` that can be installed via `pip` or `conda`. + +Alternatives to install HDF5 compression filters are: system-wide installation on Linux or other conda packages: `blosc-hdf5-plugin `_, `hdf5-lz4 `_. + +:doc:`install` + How-to install *hdf5plugin* + +:doc:`usage` + How-to use *hdf5plugin* + +:doc:`information` + Releases, changelog, repository, license + +:doc:`contribute` + How-to contribute to *hdf5plugin* + +.. toctree:: + :hidden: + + install.rst + usage.rst + information.rst + contribute.rst + changelog.rst + +Indices and tables +================== + +* :ref:`genindex` +* :ref:`modindex` +* :ref:`search` diff --git a/doc/information.rst b/doc/information.rst new file mode 100644 index 00000000..da728c15 --- /dev/null +++ b/doc/information.rst @@ -0,0 +1,59 @@ +===================== + Project information +===================== + +Releases +-------- + +Source code and pre-built binaries (aka Python wheels) for Windows, MacOS and +ManyLinux are available at the following places: + +- `Wheels and source code on PyPi `_ +- `Packages on conda-forge `_ + +For the history of modifications, see the :doc:`changelog`. + +Project resources +----------------- + +- `Source repository `_ +- `Issue tracker `_ +- Continuous integration: *hdf5plugin* is continuously tested on all three major + operating systems: + + - Linux, MacOS, Windows: `GitHub Actions `_ + - Windows: `AppVeyor `_ +- `Weekly builds `_ + +HDF5 filters and compression libraries +-------------------------------------- + +HDF5 compression filters and compression libraries sources were obtained from: + +* LZ4 plugin (v0.1.0) and lz4 (v1.3.0, tag r122): https://github.com/nexusformat/HDF5-External-Filter-Plugins, https://github.com/lz4/lz4 +* bitshuffle plugin (0.3.5): https://github.com/kiyo-masui/bitshuffle +* hdf5-blosc plugin (v1.0.0), c-blosc (v1.20.1) and snappy (v1.1.1): https://github.com/Blosc/hdf5-blosc, https://github.com/Blosc/c-blosc and https://github.com/Blosc/c-blosc/tree/v1.17.0/internal-complibs/snappy-1.1.1 +* FCIDECOMP plugin (v1.0.2) and CharLS (branch 1.x-master SHA1 ID: 25160a42fb62e71e4b0ce081f5cb3f8bb73938b5): + ftp://ftp.eumetsat.int/pub/OPS/out/test-data/Test-data-for-External-Users/MTG_FCI_Test-Data/FCI_Decompression_Software_V1.0.2 and + https://github.com/team-charls/charls +* HDF5-ZFP plugin (v1.0.1) and zfp (v0.5.5): https://github.com/LLNL/H5Z-ZFP and https://github.com/LLNL/zfp +* HDF5Plugin-Zstandard (commit d5afdb5) and zstd (v1.4.5): https://github.com/aparamon/HDF5Plugin-Zstandard and https://github.com/Blosc/c-blosc/tree/v1.20.1/internal-complibs/zstd-1.4.5 + +License +------- + +The source code of *hdf5plugin* itself is licensed under the MIT license. +Use it at your own risk. +See `LICENSE `_. + +The source code of the embedded HDF5 filter plugin libraries is licensed under different open-source licenses. +Please read the different licenses: + +* bitshuffle: See `src/bitshuffle/LICENSE `_ +* blosc: See `src/hdf5-blosc/LICENSES/ `_ and `src/c-blosc/LICENSES/ `_ +* lz4: See `src/LZ4/COPYING `_ and `src/lz4-r122/LICENSE `_ +* FCIDECOMP: See `src/fcidecomp/LICENSE `_ and `src/charls/src/License.txt `_ +* zfp: See `src/H5Z-ZFP/LICENSE `_ and `src/zfp/LICENSE `_ +* zstd: See `src/HDF5Plugin-Zstandard/LICENSE `_ + +The HDF5 v1.10.5 headers (and Windows .lib file) used to build the filters are stored for convenience in the repository. The license is available here: `src/hdf5/COPYING `_. diff --git a/doc/install.rst b/doc/install.rst new file mode 100644 index 00000000..64e00658 --- /dev/null +++ b/doc/install.rst @@ -0,0 +1,72 @@ +============== + Installation +============== + +Pre-built packages +------------------ + +Pre-built binaries of `hdf5plugin` are available from: + +- `pypi `_, to install run: + ``pip install hdf5plugin [--user]`` +- `conda-forge `_, to install run: + ``conda install -c conda-forge hdf5plugin`` + +To maximize compatibility, those binaries are built without optimization options (such as `AVX2`_ and `OpenMP`_). +`Installation from source`_ can achieve better performances than pre-built binaries. + +Installation from source +------------------------ + +The build process enables compilation optimizations that are supported by the host machine. + +To install from source and recompile the HDF5 plugins, run:: + + pip install hdf5plugin --no-binary hdf5plugin [--user] + +To override the defaults that are probed from the machine, it is possible to specify build options. +This is achieved by either setting environment variables or passing options to ``python setup.py build``, for example: + +- ``HDF5PLUGIN_OPENMP=False pip install hdf5plugin --no-binary hdf5plugin`` +- From the source directory: ``python setup.py build --openmp=False`` + +Available options +................. + +.. list-table:: + :widths: 1 1 4 + :header-rows: 1 + + * - Environment variable + - ``python setup.py build`` option + - Description + * - ``HDF5PLUGIN_HDF5_DIR`` + - ``--hdf5`` + - Custom path to HDF5 (as in h5py). + * - ``HDF5PLUGIN_HDF5_DIR`` + - ``--openmp`` + - Whether or not to compile with `OpenMP`_. + Default: True if probed (always False on macOS). + * - ``HDF5PLUGIN_NATIVE`` + - ``--native`` + - True to compile specifically for the host, False for generic support (For unix compilers only). + Default: True on supported architectures, False otherwise + * - ``HDF5PLUGIN_SSE2`` + - ``--sse2`` + - Whether or not to compile with `SSE2`_ support. + Default: True on ppc64le and when probed on x86, False otherwise + * - ``HDF5PLUGIN_AVX2`` + - ``--avx2`` + - Whether or not to compile with `AVX2`_ support. avx2=True requires sse2=True. + Default: True on x86 when probed, False otherwise + * - ``HDF5PLUGIN_CPP11`` + - ``--cpp11`` + - Whether or not to compile C++11 code if available. + Default: True if probed. + +Note: Boolean options are passed as ``True`` or ``False``. + + +.. _AVX2: https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Advanced_Vector_Extensions_2 +.. _SSE2: https://en.wikipedia.org/wiki/SSE2 +.. _OpenMP: https://www.openmp.org/ diff --git a/doc/make.bat b/doc/make.bat new file mode 100644 index 00000000..922152e9 --- /dev/null +++ b/doc/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=. +set BUILDDIR=_build + +if "%1" == "" goto help + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.http://sphinx-doc.org/ + exit /b 1 +) + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/doc/usage.rst b/doc/usage.rst new file mode 100644 index 00000000..d32b28d5 --- /dev/null +++ b/doc/usage.rst @@ -0,0 +1,114 @@ +======= + Usage +======= + +.. currentmodule:: hdf5plugin + +``hdf5plugin`` allows using additional HDF5 compression filters with `h5py`_ for reading and writing compressed datasets. + +Available constants: + +* ``hdf5plugin.FILTERS``: A dictionary mapping provided filters to their ID +* ``hdf5plugin.PLUGINS_PATH``: The directory where the provided filters library are stored. + +Read compressed datasets +++++++++++++++++++++++++ + +In order to read compressed dataset with `h5py`_, use: + +.. code-block:: python + + import hdf5plugin + +It registers ``hdf5plugin`` supported compression filters with the HDF5 library used by `h5py`_. +Hence, HDF5 compressed datasets can be read as any other dataset (see `h5py documentation `_). + +Write compressed datasets ++++++++++++++++++++++++++ + +As for reading compressed datasets, ``import hdf5plugin`` is required to enable the supported compression filters. + +To create a compressed dataset use `h5py.Group.create_dataset`_ and set the ``compression`` and ``compression_opts`` arguments. + +``hdf5plugin`` provides helpers to prepare those compression options: `Bitshuffle`_, `Blosc`_, `FciDecomp`_, `LZ4`_, `Zfp`_, `Zstd`_. + +Sample code: + +.. code-block:: python + + import numpy + import h5py + import hdf5plugin + + # Compression + f = h5py.File('test.h5', 'w') + f.create_dataset('data', data=numpy.arange(100), **hdf5plugin.LZ4()) + f.close() + + # Decompression + f = h5py.File('test.h5', 'r') + data = f['data'][()] + f.close() + +Relevant `h5py`_ documentation: `Filter pipeline `_ and `Chunked Storage `_. + + +Bitshuffle +========== + +.. autoclass:: Bitshuffle + :members: + :undoc-members: + +Blosc +===== + +.. autoclass:: Blosc + :members: + :undoc-members: + +FciDecomp +========= + +.. autoclass:: FciDecomp + :members: + :undoc-members: + +LZ4 +=== + +.. autoclass:: LZ4 + :members: + :undoc-members: + +Zfp +=== + +.. autoclass:: Zfp + :members: + :undoc-members: + +Zstd +==== + +.. autoclass:: Zstd + :members: + :undoc-members: + +Use HDF5 filters in other applications +++++++++++++++++++++++++++++++++++++++ + +Non `h5py`_ or non-Python users can also benefit from the supplied HDF5 compression filters for reading compressed datasets by setting the ``HDF5_PLUGIN_PATH`` environment variable the value of ``hdf5plugin.PLUGINS_PATH``, which can be retrieved from the command line with:: + + python -c "import hdf5plugin; print(hdf5plugin.PLUGINS_PATH)" + +For instance:: + + export HDF5_PLUGIN_PATH=$(python -c "import hdf5plugin; print(hdf5plugin.PLUGINS_PATH)") + +should allow MatLab or IDL users to read data compressed using the supported plugins. + +Setting the ``HDF5_PLUGIN_PATH`` environment variable allows already existing programs or Python code to read compressed data without any modification. + +.. _h5py: https://www.h5py.org +.. _h5py.Group.create_dataset: https://docs.h5py.org/en/stable/high/group.html#h5py.Group.create_dataset diff --git a/setup.py b/setup.py index 193f01f7..7736ca3a 100644 --- a/setup.py +++ b/setup.py @@ -752,6 +752,7 @@ def make_distribution(self): ext_modules=extensions, install_requires=['h5py'], setup_requires=['setuptools'], + extras_require={'dev': ['sphinx', 'sphinx_rtd_theme']}, cmdclass=cmdclass, libraries=libraries, zip_safe=False, diff --git a/src/hdf5plugin/__init__.py b/src/hdf5plugin/__init__.py index d3e42e50..fc18a859 100644 --- a/src/hdf5plugin/__init__.py +++ b/src/hdf5plugin/__init__.py @@ -117,14 +117,57 @@ def __getitem__(self, item): return self._kwargs[item] +class Bitshuffle(_FilterRefClass): + """``h5py.Group.create_dataset``'s compression arguments for using bitshuffle filter. + + It can be passed as keyword arguments: + + .. code-block:: python + + f = h5py.File('test.h5', 'w') + f.create_dataset( + 'bitshuffle_with_lz4', + data=numpy.arange(100), + **hdf5plugin.Bitshuffle(nelems=0, lz4=True)) + f.close() + + :param int nelems: + The number of elements per block. + It needs to be divisible by eight (default is 0, about 8kB per block) + Default: 0 (for about 8kB per block). + :param bool lz4: + Whether to use lz4 compression or not as part of the filter. + Default: True + """ + filter_id = BSHUF_ID + + def __init__(self, nelems=0, lz4=True): + nelems = int(nelems) + assert nelems % 8 == 0 + + lz4_enabled = 2 if lz4 else 0 + self.filter_options = (nelems, lz4_enabled) + + class Blosc(_FilterRefClass): - """h5py.Group.create_dataset's compression and compression_opts arguments for using blosc filter. + """``h5py.Group.create_dataset``'s compression arguments for using blosc filter. + + It can be passed as keyword arguments: + + .. code-block:: python + + f = h5py.File('test.h5', 'w') + f.create_dataset( + 'blosc_byte_shuffle_blosclz', + data=numpy.arange(100), + **hdf5plugin.Blosc(cname='blosclz', clevel=9, shuffle=hdf5plugin.Blosc.SHUFFLE)) + f.close() :param str cname: `blosclz`, `lz4` (default), `lz4hc`, `zlib`, `zstd` Optional: `snappy`, depending on compilation (requires C++11). :param int clevel: - Compression level from 0 no compression to 9 maximum compression. + Compression level from 0 (no compression) to 9 (maximum compression). Default: 5. :param int shuffle: One of: - Blosc.NOSHUFFLE (0): No shuffle @@ -160,31 +203,45 @@ def __init__(self, cname='lz4', clevel=5, shuffle=SHUFFLE): self.filter_options = (0, 0, 0, 0, clevel, shuffle, compression) -class Bitshuffle(_FilterRefClass): - """h5py.Group.create_dataset's compression and compression_opts arguments for using bitshuffle filter. +class FciDecomp(_FilterRefClass): + """``h5py.Group.create_dataset``'s compression arguments for using FciDecomp filter. - :param int nelems: - The number of elements per block. - Default: 0 (for about 8kB per block). - :param bool lz4: - Whether to use LZ4_ID compression or not as part of the filter. - Default: True - """ - filter_id = BSHUF_ID + It can be passed as keyword arguments: - def __init__(self, nelems=0, lz4=True): - nelems = int(nelems) - assert nelems % 8 == 0 + .. code-block:: python - lz4_enabled = 2 if lz4 else 0 - self.filter_options = (nelems, lz4_enabled) + f = h5py.File('test.h5', 'w') + f.create_dataset( + 'fcidecomp', + data=numpy.arange(100), + **hdf5plugin.FciDecomp()) + f.close() + """ + filter_id = FCIDECOMP_ID + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + if not config.cpp11: + _logger.error( + "The FciDecomp filter is not available as hdf5plugin was not built with C++11.\n" + "You may need to reinstall hdf5plugin with a recent version of pip, or rebuild it with a newer compiler.") class LZ4(_FilterRefClass): - """h5py.Group.create_dataset's compression and compression_opts arguments for using lz4 filter. + """``h5py.Group.create_dataset``'s compression arguments for using lz4 filter. - :param int nelems: + It can be passed as keyword arguments: + + .. code-block:: python + + f = h5py.File('test.h5', 'w') + f.create_dataset('lz4', data=numpy.arange(100), + **hdf5plugin.LZ4(nbytes=0)) + f.close() + + :param int nbytes: The number of bytes per block. + It needs to be in the range of 0 < nbytes < 2113929216 (1,9GB). Default: 0 (for 1GB per block). """ filter_id = LZ4_ID @@ -196,20 +253,70 @@ def __init__(self, nbytes=0): class Zfp(_FilterRefClass): - """h5py.Group.create_dataset's compression and compression_opts arguments for using ZFP filter. + """``h5py.Group.create_dataset``'s compression arguments for using ZFP filter. + + It can be passed as keyword arguments: + + .. code-block:: python + + f = h5py.File('test.h5', 'w') + f.create_dataset( + 'zfp', + data=numpy.random.random(100), + **hdf5plugin.Zfp()) + f.close() This filter provides different modes: - - **Fixed-rate** mode: To use, set the `rate` argument. - For details, see https://zfp.readthedocs.io/en/latest/modes.html#fixed-rate-mode. - - **Fixed-precision** mode: To use, set the `precision` argument. - For details, see https://zfp.readthedocs.io/en/latest/modes.html#fixed-precision-mode. - - **Fixed-accuracy** mode: To use, set the `accuracy` argument - For details, see https://zfp.readthedocs.io/en/latest/modes.html#fixed-accuracy-mode. - - **Reversible** (i.e., lossless) mode: To use, set the `reversible` argument to True - For details, see https://zfp.readthedocs.io/en/latest/modes.html#reversible-mode. - - **Expert** mode: To use, set the `minbits`, `maxbits`, `maxprec` and ``minexp` arguments. - For details, see https://zfp.readthedocs.io/en/latest/modes.html#expert-mode. + - **Fixed-rate** mode: To use, set the ``rate`` argument. + For details, see `zfp fixed-rate mode `_. + + .. code-block:: python + + f.create_dataset( + 'zfp_fixed_rate', + data=numpy.random.random(100), + **hdf5plugin.Zfp(rate=10.0)) + + - **Fixed-precision** mode: To use, set the ``precision`` argument. + For details, see `zfp fixed-precision mode `_. + + .. code-block:: python + + f.create_dataset( + 'zfp_fixed_precision', + data=numpy.random.random(100), + **hdf5plugin.Zfp(precision=10)) + + - **Fixed-accuracy** mode: To use, set the ``accuracy`` argument + For details, see `zfp fixed-accuracy mode `_. + + .. code-block:: python + + f.create_dataset( + 'zfp_fixed_accuracy', + data=numpy.random.random(100), + **hdf5plugin.Zfp(accuracy=0.001)) + + - **Reversible** (i.e., lossless) mode: To use, set the ``reversible`` argument to True + For details, see `zfp reversible mode `_. + + .. code-block:: python + + f.create_dataset( + 'zfp_reversible', + data=numpy.random.random(100), + **hdf5plugin.Zfp(reversible=True)) + + - **Expert** mode: To use, set the ``minbits``, ``maxbits``, ``maxprec`` and ``minexp`` arguments. + For details, see `zfp expert mode `_. + + .. code-block:: python + + f.create_dataset( + 'zfp_expert', + data=numpy.random.random(100), + **hdf5plugin.Zfp(minbits=1, maxbits=16657, maxprec=64, minexp=-1074)) :param float rate: Use fixed-rate mode and set the number of compressed bits per value. @@ -271,22 +378,20 @@ def __init__(self, class Zstd(_FilterRefClass): - """h5py.Group.create_dataset's compression and compression_opts arguments for using FciDecomp filter. - """ - filter_id = ZSTD_ID + """``h5py.Group.create_dataset``'s compression arguments for using FciDecomp filter. + It can be passed as keyword arguments: -class FciDecomp(_FilterRefClass): - """h5py.Group.create_dataset's compression and compression_opts arguments for using FciDecomp filter. - """ - filter_id = FCIDECOMP_ID + .. code-block:: python - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - if not config.cpp11: - _logger.error( - "The FciDecomp filter is not available as hdf5plugin was not built with C++11.\n" - "You may need to reinstall hdf5plugin with a recent version of pip, or rebuild it with a newer compiler.") + f = h5py.File('test.h5', 'w') + f.create_dataset( + 'zstd', + data=numpy.arange(100), + **hdf5plugin.Zstd()) + f.close() + """ + filter_id = ZSTD_ID def _init_filters():