-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for RNTuple RC2 #1115
Conversation
[skip ci]
[skip ci]
All RNTuple tests are passing now (although they require scikit-hep/scikit-hep-testdata#144), so I think this is pretty close to being done. One thing I want to confirm is whether we want to make I haven't confirmed that all the fancy features of RNTuple are working, but I think the basic functionality is good enough for now and I can look into further fixes and improvements in other PRs. |
I'm gonna reply to scikit-hep/scikit-hep-testdata#144 (comment) here since it was probably meant to be posted here. Yeah, that's a good point. xxhash is not currently in cramjam and I'm not sure if they would be open to include it since a "good" implementation should be very fast, given that that's its main selling point. It could be that they're willing to implement a slow version for compatibility reasons. There's other alternatives like ppxxh that implement xxhash in pure Python, so they're compatible with Pyodide, but I'm not sure if they're actively maintained. For now, I'll remove the dependency and we can discuss it further later on. |
I can confirm As for the speed, hash is hard to get fast... as you can see my naive Julia implementation only has 33% of the speed compare to wrapper around the reference C implementation. In fact the thing I wish existed when I implemented it was a Numba or Jax implementation... because all the class-OOP implementations are not straightforward to learn from |
I marked this as ready for review, but it would probably be better to merge and release scikit-hep/scikit-hep-testdata#144 before merging this PR so that we don't have a bunch of failing tests. |
scikit-hep/scikit-hep-testdata#144 is included in scikit-hep-testdata 0.4.41, so I'm going to trigger these tests to run again. Hopefully, they should pass with the new RNTuple files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice! I see that extensive changes were needed, but they're all using the chunk/cursor/struct tools appropriately. I see that you even updated some comments, generated from cursor.debug
, to serve as a guide. (Thank you! Outdated comments are the worst.)
I see that the test is picking up scikit-hep-testdata-0.4.39
. That's weird; why not 0.4.41? I'm going to try running the tests locally, to be sure they work with the new files.
@jpivarski the upload of the test data package to PyPI from the CI is failing. It seems like it's been broken for a couple of weeks because v0.4.40 also failed to be deployed to the PyPI. Could it have anything to do with the time when there was a bug with GHA secrets and you did some cleanup for a bunch of repos? https://github.com/scikit-hep/scikit-hep-testdata/actions/runs/8379922201/job/22947949026 |
Also, I'm really confused how the test passed when it downloaded 0.4.39. Does it download the files from somewhere else? |
That's what it was. scikit-hep-testdata 0.4.42 is out now. However, the tests are and were failing for me, when I've been running them locally. I don't understand why they passed in CI. |
I had a bunch of issues when using |
That's right! The files are in I'm going to flush my local cache and try to test again locally. |
That did it, for my tests locally. It may be that this test workflow has a cache of its own? No, I don't see it in the workflow YAMLs. |
With the new RNTuple files (and not with the old RNTuple files), all of the tests pass locally for me. The CI might have been incorrect when it passed before, but a careful check says it's okay and the CI is giving a green light, so it's okay to merge it now. Go ahead and do that—I'm asking you to, in order to manage the concurrency problem of the fact that you might still be trying things on this branch. |
This is still work in progress. I'll update this comment to when there are new changes.
Current status: