Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38045: [C#] Add flight dictionary support #38046

Closed

Conversation

alexwilcoxson-rel
Copy link

@alexwilcoxson-rel alexwilcoxson-rel commented Oct 5, 2023

Rationale for this change

This PR adds initial support for Dictionary arrays over Arrow flight in C# server and client.

What changes are included in this PR?

Updated FlightDataStream and Flight record batch reader to support dictionaries. Updated ArrowStreamWriter to support resending dictionaries, only enabled for Flight use case. Updated FlightTests.

Are these changes tested?

Updated FlightTests to include DictionaryArrays

Are there any user-facing changes?

No

@github-actions
Copy link

github-actions bot commented Oct 5, 2023

⚠️ GitHub issue #38045 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Oct 13, 2023
@alexwilcoxson-rel
Copy link
Author

@westonpace you're listed as codeowner, could you take a look please?

@CurtHagenlocher
Copy link
Contributor

I can try to take a look at this over the US holiday; I've avoided it so far because it changes code I'm not familiar with.

@CurtHagenlocher
Copy link
Contributor

CurtHagenlocher commented Dec 2, 2023

(still working my way up to this; it looks like dictionaries are problematic in general for C# IPC and so I want to understand what's going wrong there before looking at a Flight-specific dictionary batch issue)

@@ -24,13 +24,13 @@ class DictionaryMemo
{
private readonly Dictionary<long, IArrowArray> _idToDictionary;
private readonly Dictionary<long, IArrowType> _idToValueType;
private readonly Dictionary<Field, long> _fieldToId;
private readonly Dictionary<string, long> _fieldToId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change is technically correct because it ignores the fact that e.g. two nested structs could have dictionary-encoded fields with the same name and these would now collide. Obviously the existing implementation isn't great either in that it implicitly relies on reference equality (which I assume is what causes problems for Flight). I don't have any immediate suggestions though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. When I have time to get back to this I will write a test for this and look into it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check out my attempt at an implementation at https://github.com/CurtHagenlocher/arrow/tree/dev/curth/FlightDictionaries. It changes the write side of flight server so that the same schema is used to collect the dictionaries for each batch. This continues to let reference equality work for calculating dictionary IDs. I also refactored the "Post" changes to be (what I think is) a little cleaner.

Disclaimer: I am not very familiar with the Flight code or protocol, but your changed tests are passing.

@CurtHagenlocher
Copy link
Contributor

FYI @alexwilcoxson-rel I made substantial changes to dictionaries in #39146 because they weren't working for files or in-memory data. I don't think there's much overlap between that change and this one; the sole exception is probably the hooks to use for tracking dictionary writes. These may need to be reconciled.

@CurtHagenlocher
Copy link
Contributor

@alexwilcoxson-rel , are you still interested in completing this PR? Should I pursue the work I started in my branch?

@alexwilcoxson-rel
Copy link
Author

Hi @CurtHagenlocher apologies for not following up, we have had other priorities come up and needing dictionary support for flight dotnet is on our backlog. I will close this and feel free to pick anything up on your end. Once we get to that ticket on our end I will check in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C#] Add support for sending dictionary arrays via flight
3 participants