-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-8018][VL] Support adjusting stage resource profile dynamically #8209
Conversation
Run Gluten Clickhouse CI on x86 |
Thank you for splitting the PR into 2. Let's discuss in #8195 before moving forward to this one. |
8073c47
to
7786636
Compare
Run Gluten Clickhouse CI on x86 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some tests in this PR? Since the feature seems functional to users after the change. Thanks!
Also, could change [CORE]
to [VL]
in PR title if this is not ready for CH use yet.
backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxRuleApi.scala
Outdated
Show resolved
Hide resolved
backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxRuleApi.scala
Outdated
Show resolved
Hide resolved
7786636
to
38dde01
Compare
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Thank you, @zjuwangg Since it's a big feature. Would you like to create a doc for the featuer in docs? Or add in a follow up PR? |
Yeah, of course. I'll add detailed doc in the follow up PR. |
Run Gluten Clickhouse CI on x86 |
...ait/src/main/scala/org/apache/spark/sql/execution/GlutenAutoAdjustStageResourceProfile.scala
Outdated
Show resolved
Hide resolved
e34ed0f
to
2ea2934
Compare
Run Gluten Clickhouse CI on x86 |
shims/common/src/main/scala/org/apache/gluten/GlutenConfig.scala
Outdated
Show resolved
Hide resolved
...ait/src/main/scala/org/apache/spark/sql/execution/GlutenAutoAdjustStageResourceProfile.scala
Outdated
Show resolved
Hide resolved
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
val c2RorR2CCnt = planNodes.count( | ||
p => p.isInstanceOf[ColumnarToRowTransition] || p.isInstanceOf[RowToColumnarTransition]) | ||
val totalCount = planNodes.size | ||
|
||
if (1.0 * c2RorR2CCnt / totalCount >= glutenConf.autoAdjustStageC2RorR2CRatioThreshold) { | ||
val newMemoryAmount = memoryRequest.get.amount * glutenConf.autoAdjustStageRPHeapRatio; | ||
val newExecutorMemory = | ||
new ExecutorResourceRequest(ResourceProfile.MEMORY, newMemoryAmount.toLong) | ||
executorResource.put(ResourceProfile.MEMORY, newExecutorMemory) | ||
val newRP = new ResourceProfile(executorResource.toMap, taskResource.toMap) | ||
return applyNewResourceProfileIfPossible(plan, newRP, rpManager) | ||
} | ||
plan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are still collecting c2RorR2CCnt? While @PHILO-HE suggested to count on fallen back nodes. Do you have any comment here? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we are still collecting c2RorR2CCnt? While @PHILO-HE suggested to count on fallen back nodes. Do you have any comment here? Thanks.
Good catch. I mistake the fallen node with r2cOrC2RCnt. Will address soon!
Run Gluten Clickhouse CI on x86 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically the code structure looks good to me. I didn't test or verify it, do you have any test result about the feature? Thanks.
...s-velox/src/test/scala/org/apache/gluten/execution/AutoAdjustStageResourceProfileSuite.scala
Outdated
Show resolved
Hide resolved
Yes. Worked in our inner version in an similar way. |
I didn't verify the change so let's hold on for some time to see if anyone likes to review before proceeding to merge. Thank you for the contribution in advance. |
@zjuwangg I'm merging and since there's no one else helping verify this yet, would you like to mark the following as experimental with a new PR or so? Thanks.
|
Oops, the PR has confilicts. So we can do this together with a rebase I guess. |
...ait/src/main/scala/org/apache/spark/sql/execution/GlutenAutoAdjustStageResourceProfile.scala
Outdated
Show resolved
Hide resolved
...ait/src/main/scala/org/apache/spark/sql/execution/GlutenAutoAdjustStageResourceProfile.scala
Outdated
Show resolved
Hide resolved
...s-velox/src/test/scala/org/apache/gluten/execution/AutoAdjustStageResourceProfileSuite.scala
Outdated
Show resolved
Hide resolved
...s-velox/src/test/scala/org/apache/gluten/execution/AutoAdjustStageResourceProfileSuite.scala
Outdated
Show resolved
Hide resolved
Run Gluten Clickhouse CI on x86 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from my side
cc @jackylee-ch if you have further comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What changes were proposed in this pull request?
It's a follow PR of #8195 and demonstrate how to adjust resource profile through Rule.
How was this patch tested?
Unit test will be updated soon.
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)