-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-8453] [VL] Allow Heavy Batch to be Processed by ColumnarCachedBatchSerializer #8454
[GLUTEN-8453] [VL] Allow Heavy Batch to be Processed by ColumnarCachedBatchSerializer #8454
Conversation
ArnavBalyan
commented
Jan 7, 2025
•
edited
Loading
edited
- Currently the ColumnarCachedBatchSerializer does not support Arrow Heavy Batch.
- ColumnarCachedBatchSerializer expects light batch to offload to native. (In most cases it receives an already offloaded, however fails when the input is a heavy batch).
- Added conversion to offload it if the upstream operator produced an ArrowJavaBatch.
- Also makes the check light/heavy batch public, since they can be good utility functions and don't have critical logic inside.
- Note: This is a fix which will make it work, but ideally it should work with RAS and be compatible with the transitions added, to do this we can wrap the InMemoryTableScanExec and register as a Gluten operator to elegantly offload. I'll investigate as part 2
@@ -17,7 +17,7 @@ | |||
package org.apache.spark.sql.execution | |||
|
|||
import org.apache.gluten.backendsapi.BackendsApiManager | |||
import org.apache.gluten.columnarbatch.ColumnarBatches | |||
import org.apache.gluten.columnarbatch.{ColumnarBatches, VeloxColumnarBatches} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhztheplayer Ok to add VeloxColumnarBatches here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's normal to call the utility from Velox backend code. However it seems like discussible on whether to rely on isLightBatch
/ isHeavyBatch
to add conditional transitions.
@ArnavBalyan Would you like to help check if we can somehow add explicit transition nodes (LoadArrowData / OffloadArrowData) into query plan instead of the PR's change? Or is the last Note.
in pr description meant for something similar? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also refer to a previous effort #7313 if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @FelixYBW @zhztheplayer!
Yes, the note was meant for that. Ideally the transitions should have added the correct transition node before this, However the serializer is a special case since it's not an operator and does not extend the GlutenPlan, I have some ideas to explore this which may require some design changes in the serializer to make it work with transitions.
Would it be possible to merge this for now since the ColumnarRange operator depends on it and I'll work on the serializer compatibility for transitions, let me know what you think thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However the serializer is a special case since it's not an operator and does not extend the GlutenPlan
Agreed. The code path is different. Thanks for figuring out on this.
Do you think we can add a UT for the change in this PR? If this can be considered an individual fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure let me add it in the ColumnarRangeExec, since it already has the failing UT thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please feel free to move forward to the Range PR. I am also testing the relevant code and will help add a test case here.
…for limited use cases (#8463)