feat: add support for RNTuple RC2 #1115

ariostas · 2024-02-01T17:52:27Z

This is still work in progress. I'll update this comment to when there are new changes.

Current status:

Seems like everything is read correctly (in a very simple test), but the column logic is broken.
The writing part hasn't been updated.

[skip ci]

ariostas · 2024-03-06T20:13:42Z

All RNTuple tests are passing now (although they require scikit-hep/scikit-hep-testdata#144), so I think this is pretty close to being done. One thing I want to confirm is whether we want to make xxhash a dependency or not. I was working with the assumption that it should be a dependency, but in a brief discussion with @jpivarski it seemed like maybe it was preferable to have it only as a writing dependency. I'll fix it if that's the case.

I haven't confirmed that all the fancy features of RNTuple are working, but I think the basic functionality is good enough for now and I can look into further fixes and improvements in other PRs.

ariostas · 2024-03-07T15:54:05Z

I'm gonna reply to scikit-hep/scikit-hep-testdata#144 (comment) here since it was probably meant to be posted here.

Yeah, that's a good point. xxhash is not currently in cramjam and I'm not sure if they would be open to include it since a "good" implementation should be very fast, given that that's its main selling point. It could be that they're willing to implement a slow version for compatibility reasons. There's other alternatives like ppxxh that implement xxhash in pure Python, so they're compatible with Pyodide, but I'm not sure if they're actively maintained. For now, I'll remove the dependency and we can discuss it further later on.

Moelf · 2024-03-07T16:04:12Z

I can confirm ppxxh works in the sense that I implemented a native XXhash (https://github.com/Moelf/XXHashNative.jl) based on spec and used ppxxh for debugging.

As for the speed, hash is hard to get fast... as you can see my naive Julia implementation only has 33% of the speed compare to wrapper around the reference C implementation. In fact the thing I wish existed when I implemented it was a Numba or Jax implementation... because all the class-OOP implementations are not straightforward to learn from

ariostas · 2024-03-21T15:19:47Z

I marked this as ready for review, but it would probably be better to merge and release scikit-hep/scikit-hep-testdata#144 before merging this PR so that we don't have a bunch of failing tests.

jpivarski · 2024-03-21T20:48:20Z

scikit-hep/scikit-hep-testdata#144 is included in scikit-hep-testdata 0.4.41, so I'm going to trigger these tests to run again. Hopefully, they should pass with the new RNTuple files.

jpivarski

This is very nice! I see that extensive changes were needed, but they're all using the chunk/cursor/struct tools appropriately. I see that you even updated some comments, generated from cursor.debug, to serve as a guide. (Thank you! Outdated comments are the worst.)

I see that the test is picking up scikit-hep-testdata-0.4.39. That's weird; why not 0.4.41? I'm going to try running the tests locally, to be sure they work with the new files.

ariostas · 2024-03-21T22:10:44Z

@jpivarski the upload of the test data package to PyPI from the CI is failing. It seems like it's been broken for a couple of weeks because v0.4.40 also failed to be deployed to the PyPI. Could it have anything to do with the time when there was a bug with GHA secrets and you did some cleanup for a bunch of repos?

https://github.com/scikit-hep/scikit-hep-testdata/actions/runs/8379922201/job/22947949026

ariostas · 2024-03-21T22:24:02Z

Also, I'm really confused how the test passed when it downloaded 0.4.39. Does it download the files from somewhere else?

jpivarski · 2024-03-21T22:26:36Z

Could it have anything to do with the time when there was a bug with GHA secrets and you did some cleanup for a bunch of repos?

That's what it was. scikit-hep-testdata 0.4.42 is out now. However, the tests are and were failing for me, when I've been running them locally. I don't understand why they passed in CI.

ariostas · 2024-03-21T22:30:37Z

I had a bunch of issues when using scikit-hep-testdata locally. Even if I installed the new version it kept using the old files. I had to manually go where it keeps the files and put the new ones in there. I think it must be downloading them fresh instead of using what's packaged in the wheel for some reason.

jpivarski · 2024-03-21T22:38:42Z

That's right! The files are in ~/.local/skhepdata, and this is a weakness for scikit-hep-testdata, at least for any files that are ever changed in place. Normally, we only add new ones, but for RNTuple, it makes sense to replace them. (Maybe. Maybe it would have been better to give them new names. Oh well! Too late now!)

I'm going to flush my local cache and try to test again locally.

jpivarski · 2024-03-21T22:42:12Z

That did it, for my tests locally.

It may be that this test workflow has a cache of its own? No, I don't see it in the workflow YAMLs.

jpivarski · 2024-03-21T22:46:42Z

With the new RNTuple files (and not with the old RNTuple files), all of the tests pass locally for me. The CI might have been incorrect when it passed before, but a careful check says it's okay and the CI is giving a green light, so it's okay to merge it now.

Go ahead and do that—I'm asking you to, in order to manage the concurrency problem of the fact that you might still be trying things on this branch.

ariostas and others added 10 commits February 1, 2024 12:39

Started implementing RNTuple RC2 spec

798c9fe

style: pre-commit fixes

b643fd7

Header is now read correctly

8f74d83

Footer is now read correctly

d59f356

[skip ci]

Merge pre-commit changes.

2afc5b7

[skip ci]

style: pre-commit fixes

e09bad6

Reading page links now seems to work

502a91f

Mostly working, but column logic is broken

a8978e5

[skip ci]

Columns of strings can now be read correctly

1f98ef6

Merge branch 'main' into ariostas/rntuple_rc2

06bf08c

ariostas mentioned this pull request Mar 6, 2024

chore: update RNTuple files to RC2 scikit-hep/scikit-hep-testdata#144

Merged

All RNTuple tests pass now

1a55022

ariostas added 2 commits March 7, 2024 16:52

Moved xxhash back to only being a testing dependency

e6cde44

Merge branch 'main' into ariostas/rntuple_rc2

3037545

ariostas marked this pull request as ready for review March 21, 2024 15:18

jpivarski approved these changes Mar 21, 2024

View reviewed changes

jpivarski mentioned this pull request Mar 21, 2024

chore: update pre-commit hooks #1169

Merged

ariostas merged commit e29932f into main Mar 22, 2024
40 checks passed

ariostas deleted the ariostas/rntuple_rc2 branch March 22, 2024 12:14

ariostas mentioned this pull request Mar 22, 2024

Only the latest data files are accessible scikit-hep/scikit-hep-testdata#147

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add support for RNTuple RC2 #1115

feat: add support for RNTuple RC2 #1115

ariostas commented Feb 1, 2024 •

edited

Loading

ariostas commented Mar 6, 2024

ariostas commented Mar 7, 2024

Moelf commented Mar 7, 2024 •

edited

Loading

ariostas commented Mar 21, 2024

jpivarski commented Mar 21, 2024

jpivarski left a comment

ariostas commented Mar 21, 2024

ariostas commented Mar 21, 2024

jpivarski commented Mar 21, 2024

ariostas commented Mar 21, 2024

jpivarski commented Mar 21, 2024

jpivarski commented Mar 21, 2024

jpivarski commented Mar 21, 2024

feat: add support for RNTuple RC2 #1115

feat: add support for RNTuple RC2 #1115

Conversation

ariostas commented Feb 1, 2024 • edited Loading

ariostas commented Mar 6, 2024

ariostas commented Mar 7, 2024

Moelf commented Mar 7, 2024 • edited Loading

ariostas commented Mar 21, 2024

jpivarski commented Mar 21, 2024

jpivarski left a comment

Choose a reason for hiding this comment

ariostas commented Mar 21, 2024

ariostas commented Mar 21, 2024

jpivarski commented Mar 21, 2024

ariostas commented Mar 21, 2024

jpivarski commented Mar 21, 2024

jpivarski commented Mar 21, 2024

jpivarski commented Mar 21, 2024

ariostas commented Feb 1, 2024 •

edited

Loading

Moelf commented Mar 7, 2024 •

edited

Loading