Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: all essential RNTuple writing functionality #1395

Merged
merged 22 commits into from
Mar 6, 2025

Conversation

ariostas
Copy link
Collaborator

@ariostas ariostas commented Feb 27, 2025

This PR extend the writing functionality for RNTuples quite a bit. It adds the following:

  • Lists of booleans
  • Lists of strings
  • Jagged arrays (including nested jagged arrays)
  • Rectangular arrays
  • Structs
  • Lists of structs
  • Optional types
  • Union types

@ariostas
Copy link
Collaborator Author

ariostas commented Mar 5, 2025

All the functionality is now ready, I'll do a bit of cleanup tomorrow morning and it's ready to go.

@ianna ianna mentioned this pull request Mar 6, 2025
@ariostas ariostas changed the title feat: write RNTuples with strings, structs, and nested lists feat: all essential RNTuple writing functionality Mar 6, 2025
@ariostas
Copy link
Collaborator Author

ariostas commented Mar 6, 2025

This is all I was planning to do for now. I ended up adding all the essential functionality that I could think of. There are still Awkward Form/Arrays that are not supported, but I think they are probably not too common and don't have direct translation to RNTuple fields, so I'll deal with them later. You can see from this test data that all essentials should be covered.

data = ak.Array(
{
"bool": [True, False, True],
"int": [1, 2, 3],
"float": [1.1, 2.2, 3.3],
"jagged_list": [[1], [2, 3], [4, 5, 6]],
"nested_list": [[[1], []], [[2], [3, 3]], [[4, 5, 6]]],
"string": ["one", "two", "three"],
"utf8_string": ["こんにちは", "⚛️💫🎆😀", "ǧ̸̛̫͍̰͖̟̈͛͑͆̆̌̃̉̅̄̔̈́̀̔͆̄͋̍͐͂̎͗̈́͒͘͝ͅö̴̮̝̪̬͎͚̜̖̜͖̞̤͕̙͂̀̀̊͛͑̈́͛͐͊͂͂̇͛̾̔͐͆͑͂̓̅̀͘͘͘̕͝͠͝͝ơ̶͍̙̻̾̈́̓̈́̀̅͑ḑ̷͚̠̹̗͉͙̞͇͕̼̲̥͉̯̞͕̲̻̞͗̓̃̊̅͗͊͊́̑̈́̎͋̇̓͛̅͜͜͠͝ͅb̷̢̢̨̨̛̛̘̠̞̰̺̘̰̖̺̞̱͇̰̙̲̱̪͕͎͉̖̞͇̹̮͙͋̀͑͂̈́̇͛̐͊̀̇͆̓̋̀̿̋̂̅̀̌̑̓̽͊̂͑̈̇̚͜͝y̶̗͇̠̞͚̦̮̦͈̹̥̋̓̓̈́̐̆̀̄̋̂̀̇͋̎̚͜͝ȩ̷̢̡͇̮̩̹̥̬̰͎͔̬̩̰̯͍̲͎̭͉̬̣̻̖͍̥̟̪͕̫̟̋̔̀͆̑̈́̐̃͐͌̍͒̔̈́̃̈́̐̔̾͊̿̓͆͑̚͜͝͝͝ͅ"],
"regular": ak.Array(
ak.contents.RegularArray(
ak.contents.NumpyArray([1, 2, 3, 4, 5, 6, 7, 8, 9]), 3
)
),
"numpy_regular": numpy.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
"struct": [{"x": 1, "y": 2}, {"x": 3, "y": 4}, {"x": 5, "y": 6}],
"struct_list": [
[{"x": 1}, {"x": 2}],
[{"x": 3}, {"x": 4}],
[{"x": 5}, {"x": 6}],
],
"tuple": [(1, 2), (3, 4), (5, 6)],
"tuple_list": [[(1,), (2,)], [(3,), (4,)], [(5,), (6,)]],
"optional": [1, None, 2],
"union": [1, 2, "three"],
"optional_union": [1, None, "three"],
}
)

It's definitely not perfect, but I'll improve things later. I think it would be good to advertise that we now have experimental writing functionality and see what feedback we start getting.

@ariostas ariostas marked this pull request as ready for review March 6, 2025 15:38
Copy link
Collaborator

@ianna ianna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ariostas - Great! It looks good to me. Please, merge if you are done with it.

"jagged_list": [[1], [2, 3], [4, 5, 6]],
"nested_list": [[[1], []], [[2], [3, 3]], [[4, 5, 6]]],
"string": ["one", "two", "three"],
"utf8_string": ["こんにちは", "⚛️💫🎆😀", "ǧ̸̛̫͍̰͖̟̈͛͑͆̆̌̃̉̅̄̔̈́̀̔͆̄͋̍͐͂̎͗̈́͒͘͝ͅö̴̮̝̪̬͎͚̜̖̜͖̞̤͕̙͂̀̀̊͛͑̈́͛͐͊͂͂̇͛̾̔͐͆͑͂̓̅̀͘͘͘̕͝͠͝͝ơ̶͍̙̻̾̈́̓̈́̀̅͑ḑ̷͚̠̹̗͉͙̞͇͕̼̲̥͉̯̞͕̲̻̞͗̓̃̊̅͗͊͊́̑̈́̎͋̇̓͛̅͜͜͠͝ͅb̷̢̢̨̨̛̛̘̠̞̰̺̘̰̖̺̞̱͇̰̙̲̱̪͕͎͉̖̞͇̹̮͙͋̀͑͂̈́̇͛̐͊̀̇͆̓̋̀̿̋̂̅̀̌̑̓̽͊̂͑̈̇̚͜͝y̶̗͇̠̞͚̦̮̦͈̹̥̋̓̓̈́̐̆̀̄̋̂̀̇͋̎̚͜͝ȩ̷̢̡͇̮̩̹̥̬̰͎͔̬̩̰̯͍̲͎̭͉̬̣̻̖͍̥̟̪͕̫̟̋̔̀͆̑̈́̐̃͐͌̍͒̔̈́̃̈́̐̔̾͊̿̓͆͑̚͜͝͝͝ͅ"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow!


for f in data.fields:
if "tuple" in f:
# TODO: tuples are converted to records
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, this will fail after your PR is merged and released?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let me fix it now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this still works after the other PR since there are some other parts in the RNTuple reading part that need to be updated. I'll work on all those things in a refactoring of the reading part that I'm planning for the next couple of weeks.

@ariostas ariostas merged commit 808a723 into main Mar 6, 2025
26 checks passed
@ariostas ariostas deleted the ariostas/rntuple_lists_and_structs branch March 6, 2025 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants