Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-8018][VL] Support adjusting stage resource profile dynamically #8209

Merged
merged 10 commits into from
Jan 24, 2025

Conversation

zjuwangg
Copy link
Contributor

What changes were proposed in this pull request?

It's a follow PR of #8195 and demonstrate how to adjust resource profile through Rule.

How was this patch tested?

Unit test will be updated soon.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

@github-actions github-actions bot added CORE works for Gluten Core VELOX CLICKHOUSE labels Dec 11, 2024
Copy link

#8018

Copy link

Run Gluten Clickhouse CI on x86

@zhztheplayer zhztheplayer changed the title [GLUTEN-8018]Introduce GlutenAutoAdjustStageResourceProfileRule to auto adjust stage resource profile [GLUTEN-8018] Introduce GlutenAutoAdjustStageResourceProfileRule to auto adjust stage resource profile Dec 12, 2024
@zhztheplayer
Copy link
Member

Thank you for splitting the PR into 2. Let's discuss in #8195 before moving forward to this one.

@PHILO-HE PHILO-HE changed the title [GLUTEN-8018] Introduce GlutenAutoAdjustStageResourceProfileRule to auto adjust stage resource profile [GLUTEN-8018][CORE] Support adjust stage resource profile dynamically Dec 17, 2024
@PHILO-HE PHILO-HE changed the title [GLUTEN-8018][CORE] Support adjust stage resource profile dynamically [GLUTEN-8018][CORE] Support adjusting stage resource profile dynamically Dec 17, 2024
Copy link

Run Gluten Clickhouse CI on x86

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests in this PR? Since the feature seems functional to users after the change. Thanks!

Also, could change [CORE] to [VL] in PR title if this is not ready for CH use yet.

Copy link

Run Gluten Clickhouse CI on x86

@zjuwangg zjuwangg changed the title [GLUTEN-8018][CORE] Support adjusting stage resource profile dynamically [GLUTEN-8018][VL] Support adjusting stage resource profile dynamically Dec 18, 2024
Copy link

Run Gluten Clickhouse CI on x86

@FelixYBW
Copy link
Contributor

Thank you, @zjuwangg

Since it's a big feature. Would you like to create a doc for the featuer in docs? Or add in a follow up PR?

@zjuwangg
Copy link
Contributor Author

Thank you, @zjuwangg

Since it's a big feature. Would you like to create a doc for the featuer in docs? Or add in a follow up PR?

Yeah, of course. I'll add detailed doc in the follow up PR.

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

@zjuwangg zjuwangg requested a review from jackylee-ch December 27, 2024 05:36
Copy link

github-actions bot commented Jan 2, 2025

Run Gluten Clickhouse CI on x86

Copy link

Run Gluten Clickhouse CI on x86

@zjuwangg zjuwangg requested a review from zhztheplayer January 15, 2025 03:26
@zhztheplayer
Copy link
Member

Run Gluten Clickhouse CI on x86

Comment on lines 96 to 107
val c2RorR2CCnt = planNodes.count(
p => p.isInstanceOf[ColumnarToRowTransition] || p.isInstanceOf[RowToColumnarTransition])
val totalCount = planNodes.size

if (1.0 * c2RorR2CCnt / totalCount >= glutenConf.autoAdjustStageC2RorR2CRatioThreshold) {
val newMemoryAmount = memoryRequest.get.amount * glutenConf.autoAdjustStageRPHeapRatio;
val newExecutorMemory =
new ExecutorResourceRequest(ResourceProfile.MEMORY, newMemoryAmount.toLong)
executorResource.put(ResourceProfile.MEMORY, newExecutorMemory)
val newRP = new ResourceProfile(executorResource.toMap, taskResource.toMap)
return applyNewResourceProfileIfPossible(plan, newRP, rpManager)
}
plan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are still collecting c2RorR2CCnt? While @PHILO-HE suggested to count on fallen back nodes. Do you have any comment here? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are still collecting c2RorR2CCnt? While @PHILO-HE suggested to count on fallen back nodes. Do you have any comment here? Thanks.

Good catch. I mistake the fallen node with r2cOrC2RCnt. Will address soon!

Copy link

Run Gluten Clickhouse CI on x86

@zjuwangg zjuwangg requested a review from zhztheplayer January 17, 2025 06:57
Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the code structure looks good to me. I didn't test or verify it, do you have any test result about the feature? Thanks.

@zjuwangg
Copy link
Contributor Author

Basically the code structure looks good to me. I didn't test or verify it, do you have any test result about the feature? Thanks.

Yes. Worked in our inner version in an similar way.

@zhztheplayer
Copy link
Member

I didn't verify the change so let's hold on for some time to see if anyone likes to review before proceeding to merge. Thank you for the contribution in advance.

@zhztheplayer
Copy link
Member

@zjuwangg I'm merging and since there's no one else helping verify this yet, would you like to mark the following as experimental with a new PR or so? Thanks.

  1. Scala class AutoAdjustStageResourceProfileSuite
  2. The config options added by this PR

@zhztheplayer
Copy link
Member

@zjuwangg I'm merging and since there's no one else helping verify this yet, would you like to mark the following as experimental with a new PR or so? Thanks.

  1. Scala class AutoAdjustStageResourceProfileSuite
  2. The config options added by this PR

Oops, the PR has confilicts. So we can do this together with a rebase I guess.

Copy link

Run Gluten Clickhouse CI on x86

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my side

cc @jackylee-ch if you have further comments

Copy link
Contributor

@jackylee-ch jackylee-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jackylee-ch jackylee-ch merged commit 291ff35 into apache:main Jan 24, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CORE works for Gluten Core ready to merge VELOX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants