Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Jenkin's lookup3 as a 32-bit checksum for HDF5 #445

Closed
mkitti opened this issue Jul 12, 2023 · 3 comments · Fixed by #446
Closed

Add Jenkin's lookup3 as a 32-bit checksum for HDF5 #445

mkitti opened this issue Jul 12, 2023 · 3 comments · Fixed by #446

Comments

@mkitti
Copy link
Contributor

mkitti commented Jul 12, 2023

Minimal, reproducible code sample, a copy-pastable example if possible

# Jenkin's lookup3 can be useful for verifying HDF5 data structures such as the superblock below
import jenkins_cffi

with open("original_hdf5_zarr_shard_demo.h5", "rb") as f:
    b = f.read(48)

hash_bytes = jenkins_cffi.hashlittle(bytes(b[:-4])).to_bytes(4, "little")
print(b[-4:] == hash_bytes) # True

Problem description

Jenkin's lookup3 is an integral component of the HDF5 specification for its internal datastructures.

This becomes relevant if we would like to reuse HDF5 data structures. For example, the HDF5 Fixed Array Data Block can made byte compatible with the proposed Zarr shard specification, except for the four byte checksum. Currently, the only permitted checksum is crc32.

zarr-developers/zarr-specs#152 (comment)

An implementation of Bob Jenkin's lookup3 is widely available across many languages.

Version and installation information

Please provide the following:

jenkins-cffi              1.0.2.1                  pypi_0    pypi
python                    3.11.0          he550d4f_1_cpython    conda-forge
@mkitti
Copy link
Contributor Author

mkitti commented Jul 12, 2023

Would the best route be to use cython as @martindurant did in #412 for fletcher32 or would it make sense to add jenkins-cffi as a dependency https://github.com/what-studio/jenkins-cffi/ ?

@rabernat
Copy link
Contributor

We tend to favor either vendoring outside libraries or reimplementing algorithms in Cython. (Numcodecs has no dependencies other than numpy.) Both options seem plausible here. I'd go for the path of least resistance.

@mkitti - are you interested in working on PR to add this codec? We will be glad to support you.

@mkitti
Copy link
Contributor Author

mkitti commented Jul 12, 2023

I would be happy to work on it if there is a clear path forward. I would prefer to avoid the outcome of the SZIP/libaec effort with the pull request: #420 / #422

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants