Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] fix bug in new dict decoder #13939

Merged
merged 1 commit into from
Oct 30, 2023

Conversation

patrick-schultz
Copy link
Collaborator

The decoder uses a StagedArrayBuilder to hold elements while being sorted. The array builder is stored in a class field. When the same decoder function is called more than once, that array builder is reused.

Before this fix, the array builder was never cleared, so if the decoder function was called more than once, the array builder would still contain the elements from previously decoded dicts.

Since it's highly non-obvious that you need to call clear immediately after new StagedCodeBuilder, this PR makes the constructor take a CodeBuilder, and always inserts a clear at the call site. I also took the opportunity to CodeBuilderify the rest of the interface.

@danking
Copy link
Contributor

danking commented Oct 28, 2023

It still seems like a foot gun that I can’t use two staged code builders at once. Like, if I was implementing a “partition” function that produced two arrays from one, I might naively try to use two staged array builders. Are you saying they’ll actually be the same array builder?

@patrick-schultz
Copy link
Collaborator Author

patrick-schultz commented Oct 29, 2023

Using multiple staged array builders is fine. It's using a staged array builder in a function (or loop) which is called multiple times that's the issue.

Calling new StagedArrayBuilder(...) before only took a method builder, not a code builder. It created a new field on the containing class to hold the runtime array builder, and added code to the class constructor to initialize it. So if the generated code using that array builder is executed multiple times, it will be reusing the same array builder object in the same class field. But calling new StagedArrayBuilder multiple times will create multiple array builder objects in multiple fields.

The fix here is for the StagedArrayBuilder constructor to take a code builder, and to insert code to initialize the array builder to an empty state in the actual function, not just the containing class constructor.

@danking
Copy link
Contributor

danking commented Oct 30, 2023

@patrick-schultz OK, I understand now. Can you ping Ed or find someone else to review? Let's get this in post-haste.

@danking danking merged commit 28e56f7 into hail-is:main Oct 30, 2023
8 checks passed
@ehigham
Copy link
Member

ehigham commented Oct 30, 2023

I'm a little puzzled by this solution. I feel that we shouldn't be modifying class composition in method calls; the structure of our code seems somewhat upside down. If anything the StagedArrayBuilder should only take a class builder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants