-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-3620][VL] Support Range operator for Velox Backend #8161
base: main
Are you sure you want to change the base?
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI on x86 |
2 similar comments
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does velox currently have any similar operator that we can leverage or a new implementation will be needed in case we want to offload it.
import org.apache.spark.sql.GlutenSQLTestsTrait | ||
import org.apache.spark.sql.Row | ||
|
||
class GlutenSQLRangeExecSuite extends GlutenSQLTestsTrait { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suite would need to be enabled in VeloxTestSettings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabled
Yes, I'm planning to work on the offload implementation. Currently it seems unlikely we will have to introduce code in velox. Will raise PR once possible thanks. |
* @param child | ||
* Child SparkPlan nodes for this operator, if any. | ||
*/ | ||
case class RangeExecTransformer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we name it "ColumnarRangeExec" (or VeloxColumnarRangeExec
or something) as it doesn't extend the TransformSupport
trait? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this thanks, I am planning to add native support soon, will name it accordingly once it's done
override protected def doValidateInternal(): ValidationResult = { | ||
val isSupported = BackendsApiManager.getSettings.supportRangeExec() | ||
|
||
if (!isSupported) { | ||
return ValidationResult.failed( | ||
s"RangeExec is not supported by the current backend." | ||
) | ||
} | ||
ValidationResult.succeeded | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think usually we have to add some code to this validator to make doValidate
be used. Can you help check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am opening #8177 that may simplify the usage of validators. Perhaps could do a rebase once that PR merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, rebased since #8177 is merged, I believe we won't need to add to this validator since currently the code does not interact with native or rely on native validation. It may be needed once offloaded to velox
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArnavBalyan The validator's name is a little bit misleading, though I think it's the one to call doValidate
, then doValidateInternal
to make sure BackendsApiManager.getSettings.supportRangeExec()
is checked. Would you like to double check? Or is there another code path calling RangeExecTransformer.doValidateInternal
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes makes sense, let me double check and get back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checked, the this is defined in offloadothers, which is wired upto validatorBuilder that calls doValidate and then doValidateInternal. This will trigger the check, UTs also confirm it, I see it's currently similarly implemented for other columnar java operators as well thanks!
current += numRows * step | ||
|
||
val batch = new ColumnarBatch(vectors.asInstanceOf[Array[ColumnVector]], numRows) | ||
val offloadedBatch = ColumnarBatches.offload(allocator, batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the code, the operator's behaviour acutally align more precisely with
override def batchType(): Convention.BatchType = {
ArrowNativeBatch
}
(as ColumnarBatches.offload
results in native Arrow batch)
Would you see if we can add the code to this class? If yes we can also remove RangeExecBaseTransformer
's default implementation.
Doing this will make query planner to insert an explicit ArrowToVelox
c2c transition into query plan so we can easily collect the transition's metrics.
Futhermore, I will suggest remove line 117 and line 118 then return the batch
directly, then have the batch type changed as
override def batchType(): Convention.BatchType = {
ArrowJavaBatch
}
So two explicit c2cs (ArrowJava-to-ArrowNative, ArrowNative-to-Velox) can be inserted to query plan.
More details please refer to test code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import org.apache.spark.sql.GlutenSQLTestsTrait | ||
import org.apache.spark.sql.Row | ||
|
||
class GlutenSQLRangeExecSuite extends GlutenSQLTestsTrait { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need test suites for all Spark versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to keep it only under 33
We have some early discussion on this to map it to |
@ArnavBalyan would you please check the failed unit tests? thanks, -yuan |
Hi @zhouyuan thanks for the review, interesting thread, I will go through! I'll fix the tests today/tomorrow thanks! |
e30bbcf
to
0fcbd7f
Compare
Run Gluten Clickhouse CI on x86 |
2 similar comments
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
f656962
to
ae21c8b
Compare
Run Gluten Clickhouse CI on x86 |
ae21c8b
to
d968dd9
Compare
Run Gluten Clickhouse CI on x86 |
d968dd9
to
fe039ff
Compare
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
The unit tests are failing due to another bug, this PR will be rebased once the dependent PR is merged. Which will also fix the UTs. |
Run Gluten Clickhouse CI on x86 |
What changes were proposed in this pull request?
How was this patch tested?