Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38194][YARN][MESOS][K8S] Make memory overhead factor configurable #35504

Conversation

Kimahriman
Copy link
Contributor

@Kimahriman Kimahriman commented Feb 13, 2022

What changes were proposed in this pull request?

Add a new config to set the memory overhead factor for drivers and executors. Currently the memory overhead is hard coded to 10% (except in Kubernetes), and the only way to set it higher is to set it to a specific memory amount.

Why are the changes needed?

In dynamic environments where different people or use cases need different memory requirements, it would be helpful to set a higher memory overhead factor instead of having to set a higher specific memory overhead value. The kubernetes resource manager already makes this configurable. This makes it configurable across the board.

Does this PR introduce any user-facing change?

No change to default behavior, just adds a new config users can change.

How was this patch tested?

New UT to check the memory calculation.

.gitignore Outdated Show resolved Hide resolved
Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory overhead is not specific to yarn, but applies to all resource managers.
Also, split it by driver and executor as well, to independently control them.

See all usages of EXECUTOR_MEMORY_OVERHEAD and DRIVER_MEMORY_OVERHEAD

.gitignore Outdated Show resolved Hide resolved
@mridulm
Copy link
Contributor

mridulm commented Feb 14, 2022

This is a useful change to have, thanks for working on this @Kimahriman !
+CC @zhouyejoe, @rmcyang who have worked on something similar IIRC

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@Kimahriman
Copy link
Contributor Author

Memory overhead is not specific to yarn, but applies to all resource managers. Also, split it by driver and executor as well, to independently control them.

See all usages of EXECUTOR_MEMORY_OVERHEAD and DRIVER_MEMORY_OVERHEAD

So are you saying to just use this opportunity to unify all the difference resource managers with this config? Currently they all have their slight own ways of dealing with it but I can try to

@Kimahriman
Copy link
Contributor Author

Created new global configs and updated yarn, mesos, and kubernetes to use them. Still need to try to add tests for mesos and kubernetes to make sure they're taking effect.

@Kimahriman Kimahriman changed the title [SPARK-38194][YARN] Make yarn memory overhead factor configurable [SPARK-38194][YARN][MESOS][K8S] Make memory overhead factor configurable Feb 15, 2022
@Kimahriman
Copy link
Contributor Author

@mridulm is this what you were thinking?

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, mostly looks good to me.
Will let @dongjoon-hyun review it as well.

+CC @tgravescs

ConfigBuilder("spark.driver.memoryOverheadFactor")
.doc("The amount of non-heap memory to be allocated per driver in cluster mode, " +
"as a fraction of total driver memory. If memory overhead is specified directly, " +
"it takes precedence.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the current description for spark.kubernetes.memoryOverheadFactor in docs/running-on-kubernetes.md as template (instead of non-heap memory).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took part of it, let me know how it sounds

.version("3.3.0")
.doubleConf
.checkValue(factor => factor >= 0 && factor < 1,
"Ensure that memory overhead is a double between 0 --> 1.0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

factor > 0 would should be sufficient, we can have cases where overhead is higher than jvm Xmx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah wasn't sure why there was an upper limit, just copied from k8s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -490,7 +490,7 @@ private[spark] object Config extends Logging {
.doubleConf
.checkValue(mem_overhead => mem_overhead >= 0,
"Ensure that memory overhead is non-negative")
.createWithDefault(0.1)
.createOptional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep default (see below).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -53,18 +53,20 @@ private[spark] class BasicDriverFeatureStep(conf: KubernetesDriverConf)

// Memory settings
private val driverMemoryMiB = conf.get(DRIVER_MEMORY)
private val memoryOverheadFactor = conf.get(MEMORY_OVERHEAD_FACTOR)
.getOrElse(conf.get(DRIVER_MEMORY_OVERHEAD_FACTOR))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use DRIVER_MEMORY_OVERHEAD_FACTOR and fallback to MEMORY_OVERHEAD_FACTOR when missing.
We probably need to use get(key, defaultValue) method here though.

@@ -59,11 +59,13 @@ private[spark] class BasicExecutorFeatureStep(
private val isDefaultProfile = resourceProfile.id == ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID
private val isPythonApp = kubernetesConf.get(APP_RESOURCE_TYPE) == Some(APP_RESOURCE_TYPE_PYTHON)
private val disableConfigMap = kubernetesConf.get(KUBERNETES_EXECUTOR_DISABLE_CONFIGMAP)
private val memoryOverheadFactor = kubernetesConf.get(MEMORY_OVERHEAD_FACTOR)
.getOrElse(kubernetesConf.get(EXECUTOR_MEMORY_OVERHEAD_FACTOR))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in BasicDriverFeatureStep - change order of config query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with if (conf.contains... to more easily handle the type conversions and default value of the backup setting

@@ -85,6 +84,8 @@ private[spark] class Client(
private var appMaster: ApplicationMaster = _
private var stagingDirPath: Path = _

private val amMemoryOverheadFactor = sparkConf.get(DRIVER_MEMORY_OVERHEAD_FACTOR)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this only when in cluster mode - in client mode, AM is not the driver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just keep the hard coded 0.1 for client mode AM overhead factor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just throw in a new config for that too while I'm at it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add it for AM later if required - in client mode, AM is not really doing much.

@Kimahriman
Copy link
Contributor Author

Should I add these to the main docs page as well?

@tgravescs
Copy link
Contributor

so overall I'm fine with the concept and think should be consistent across spark. But going back to how/when this config was added, I believe it was specifically decided at that time to not have a configuration by percent on YARN. If I recall that was based on the use cases at the time so there very well be more/different use cases or we have just more experience. I think some of that is many of the off heap type configs would be specific size (ie offheap=5g). Can I ask what use cases this is targeting?

If this goes in docs need to be clear on precedence of the configs.

@Kimahriman
Copy link
Contributor Author

Can I ask what use cases this is targeting?

There's no specific use case I'm trying to use more off heap memory that's not specifically off-heap spark features. I've just noticed more than 10% extra memory being used for our normal jobs (that don't use off heap features). I have no idea why or what's using this memory, but it is. I definitely have one job that has a memory leak on the driver side, and is currently sitting at 100g reserved memory with a 32g heap, and haven't figured out why, but we also tried turning on strict memory enforcement in yarn and constantly had executors killed for using too much memory. I haven't investigated these much either, but currently we're calculating a higher memoryOverhead based on the executor memory to give it a little more breathing room, but this makes it easier to just set a factor across the board versus manually tuning memoryOverhead for each job.

I feel like I've heard G1GC uses more off-heap memory than other garbage collectors, don't know if that's true? But this is just a quality of life improvement if other people run into similar issues.

@mridulm
Copy link
Contributor

mridulm commented Feb 18, 2022

@tgravescs This is actually a usecase for us - in some of our internal analysis, the 10% overhead was found to be insufficient for some classes of applications - instead of tuning this individually, and have users forget to keep this in sync, bumping up the percentage is an option being looked into.
@Kimahriman happened to create a PR for this independently :-)

@tgravescs
Copy link
Contributor

sounds good. I haven't had time to go through code in detail. Lets make sure we document precedence with it and the memoryOverhead config and update all the .md docs (yarn and configuration), which on quick skim didn't see.

@Kimahriman
Copy link
Contributor Author

Added the new configs to the docs page, let me know how they sound or if there's any suggestions on wording

@github-actions github-actions bot added the DOCS label Feb 20, 2022
@Kimahriman Kimahriman force-pushed the yarn-configurable-memory-overhead-factor branch from 85f8b71 to 4f8e7e6 Compare March 2, 2022 01:30
@tgravescs
Copy link
Contributor

+1. Thanks @Kimahriman

@dongjoon-hyun you have changed requested, can you take another look?

@tgravescs
Copy link
Contributor

@dongjoon-hyun double checking, are you ok with this?

@tgravescs
Copy link
Contributor

Since no feedback from @dongjoon-hyun I'm going to go ahead and merge this.

@tgravescs
Copy link
Contributor

Merged to master, thanks @Kimahriman

@asfgit asfgit closed this in 71e2110 Mar 16, 2022
@tgravescs
Copy link
Contributor

oops, I see they cut the branch-3.3, I will see about merging this to that branch

asfgit pushed a commit that referenced this pull request Mar 16, 2022
### What changes were proposed in this pull request?

Add a new config to set the memory overhead factor for drivers and executors. Currently the memory overhead is hard coded to 10% (except in Kubernetes), and the only way to set it higher is to set it to a specific memory amount.

### Why are the changes needed?

In dynamic environments where different people or use cases need different memory requirements, it would be helpful to set a higher memory overhead factor instead of having to set a higher specific memory overhead value. The kubernetes resource manager already makes this configurable. This makes it configurable across the board.

### Does this PR introduce _any_ user-facing change?

No change to default behavior, just adds a new config users can change.

### How was this patch tested?

New UT to check the memory calculation.

Closes #35504 from Kimahriman/yarn-configurable-memory-overhead-factor.

Authored-by: Adam Binford <adamq43@gmail.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
(cherry picked from commit 71e2110)
Signed-off-by: Thomas Graves <tgraves@apache.org>
@tgravescs
Copy link
Contributor

merged to branch3.3 as well

@Yikun
Copy link
Member

Yikun commented Mar 17, 2022

image

FYI, some K8S IT test failed since this commit: https://github.com/Yikun/spark/runs/5573264340?check_suite_focus=true

cc @dongjoon-hyun @tgravescs

@Kimahriman
Copy link
Contributor Author

It looks like the non-jvm memory overhead default isn't being applied to the executors, but I don't see how or where it would applied before this, as the value is only used in BasicDriverFeatureStep

@tgravescs
Copy link
Contributor

yeah that was my understanding from reading the code as well, I'll look into it some more as well.

Comment on lines +62 to +66
private val memoryOverheadFactor = if (kubernetesConf.contains(EXECUTOR_MEMORY_OVERHEAD_FACTOR)) {
kubernetesConf.get(EXECUTOR_MEMORY_OVERHEAD_FACTOR)
} else {
kubernetesConf.get(MEMORY_OVERHEAD_FACTOR)
}
Copy link
Member

@Yikun Yikun Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason should be in here, before kubernetesConf.get(MEMORY_OVERHEAD_FACTOR) was used as default factor, it's 0.4 according my real debug watch on IT Run PySpark on simple pi.py example.

But current EXECUTOR_MEMORY_OVERHEAD_FACTOR has more priority than so MEMORY_OVERHEAD_FACTOR is be overrited. (so 0.1 by default). So that the default behavior changed.

But I haven't found why kubernetesConf.get(MEMORY_OVERHEAD_FACTOR) is 0.4 yet, I couldn't find a code in IT to set this explictly.

cc @Kimahriman @tgravescs

Copy link
Member

@Yikun Yikun Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// The memory overhead factor to use. If the user has not set it, then use a different
// value for non-JVM apps. This value is propagated to executors.

I found it, it is propagated to executors from driver

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah thanks this is what I was also just looking at but I'm not sure how it was propagated to the executors. I was looking at through the KubernetesDriverconf somehow or possible through the pod system properties:
MEMORY_OVERHEAD_FACTOR.key -> overheadFactor.toString.

If you find it let me know, still investigating.

Copy link
Contributor

@tgravescs tgravescs Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I think I see how this is happening:

 val resolvedDriverSpec = builder.buildFromFeatures(conf, kubernetesClient)
    val configMapName = KubernetesClientUtils.configMapNameDriver
    val confFilesMap = KubernetesClientUtils.buildSparkConfDirFilesMap(configMapName,
      conf.sparkConf, resolvedDriverSpec.systemProperties)

We build the driver spec, which includes the added system properties:

spec.systemProperties ++ addedSystemProperties

Added system properties in driver feature steps add the memory overhead setting there:


val additionalProps = mutable.Map(
      KUBERNETES_DRIVER_POD_NAME.key -> driverPodName,
      "spark.app.id" -> conf.appId,
      KUBERNETES_DRIVER_SUBMIT_CHECK.key -> "true",
      MEMORY_OVERHEAD_FACTOR.key -> overheadFactor.toString)

Then the KubernetesClientUtils.buildSparkConfDirFilesMap is called which propagates it to the executors (I think)

Copy link
Member

@Yikun Yikun Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think so.

loadedConfFilesMap ++ Map(Constants.SPARK_CONF_FILE_NAME -> resolvedProperties)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also state that if people don't want in 3.3, I'm personally fine with it just need input from anyone interested in the feature.

Thank you. Then, it's simpler because this PR was backported manually after feature freeze. :)
We are currently discussing on the whitelisting about late arrivals like this PR, aren't we, @tgravescs ?
We can discuss there together to get those input you need.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please unblock Apache Spark 3.3 K8s module QA period by reverting this. We can land it back after having a healthy commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think I get it, those "system properties" end up as default spark configs on the executor. Clear as mud

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#35900 revert pr

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your decision, @tgravescs .

@dongjoon-hyun
Copy link
Member

This is reverted from branch-3.3 via #35900 .

asfgit pushed a commit that referenced this pull request Mar 17, 2022
### What changes were proposed in this pull request?

Follow up to #35504 to fix k8s memory overhead handling.

### Why are the changes needed?

#35504 introduced a bug only caught by the K8S integration tests.

### Does this PR introduce _any_ user-facing change?

Fix back to old behavior.

### How was this patch tested?

See if IT passes

Closes #35901 from Kimahriman/k8s-memory-overhead-executors.

Authored-by: Adam Binford - Customer Site (Virginia) - CW 121796 <abinford@pu00cenvdi797.vdi.ux.dg.local>
Signed-off-by: Thomas Graves <tgraves@apache.org>
Kimahriman added a commit to Kimahriman/spark that referenced this pull request Mar 19, 2022
### What changes were proposed in this pull request?

Add a new config to set the memory overhead factor for drivers and executors. Currently the memory overhead is hard coded to 10% (except in Kubernetes), and the only way to set it higher is to set it to a specific memory amount.

### Why are the changes needed?

In dynamic environments where different people or use cases need different memory requirements, it would be helpful to set a higher memory overhead factor instead of having to set a higher specific memory overhead value. The kubernetes resource manager already makes this configurable. This makes it configurable across the board.

### Does this PR introduce _any_ user-facing change?

No change to default behavior, just adds a new config users can change.

### How was this patch tested?

New UT to check the memory calculation.

Closes apache#35504 from Kimahriman/yarn-configurable-memory-overhead-factor.

Authored-by: Adam Binford <adamq43@gmail.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
Kimahriman added a commit to Kimahriman/spark that referenced this pull request Mar 19, 2022
Follow up to apache#35504 to fix k8s memory overhead handling.

apache#35504 introduced a bug only caught by the K8S integration tests.

Fix back to old behavior.

See if IT passes

Closes apache#35901 from Kimahriman/k8s-memory-overhead-executors.

Authored-by: Adam Binford <adamq43@gmail.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While testing Apache Spark 3.3.0 RC, I noticed we missed two things in this PR.

  1. Apache Spark 3.3.0 RC complains always about spark.kubernetes.memoryOverheadFactor because the configuration has the default value (which is not given by the users). There is no way to remove the warnings which means the directional message is not helpful and makes the users confused in a wrong way. In other words, we still get warnings even we use only new configurations or no configuration.
22/06/01 23:53:49 WARN SparkConf: The configuration key 'spark.kubernetes.memoryOverheadFactor' has been deprecated as of Spark 3.3.0 and may be removed in the future. Please use spark.driver.memoryOverheadFactor and spark.executor.memoryOverheadFactor
22/06/01 23:53:49 WARN SparkConf: The configuration key 'spark.kubernetes.memoryOverheadFactor' has been deprecated as of Spark 3.3.0 and may be removed in the future. Please use spark.driver.memoryOverheadFactor and spark.executor.memoryOverheadFactor
22/06/01 23:53:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/06/01 23:53:50 WARN SparkConf: The configuration key 'spark.kubernetes.memoryOverheadFactor' has been deprecated as of Spark 3.3.0 and may be removed in the future. Please use spark.driver.memoryOverheadFactor and spark.executor.memoryOverheadFactor

2. This PR introduces a breaking change because new spark.[driver|executor].memoryOverheadFactor have values always and hide the old configurations completely in K8s resource manager.
3. This documentation removal might be too early because the deprecation is not the removal of configuration.
4. Lastly, the minimum constraint is slightly different because spark.kubernetes.memoryOverheadFactor allowed 0 since Apache Spark 2.4 while new configurations disallow it.

This is done as non-JVM tasks need more non-JVM heap space and such tasks commonly fail with "Memory Overhead Exceeded" errors. This preempts this error with a higher default.
</td>
<td>2.4.0</td>
</tr>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is (3). We should not remove a documentation during deprecation stage.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jun 1, 2022

FYI, cc @MaxGekk

@dongjoon-hyun
Copy link
Member

Hi, @tgravescs . Is there a way to mitigate this change in K8s environment at Apache Spark 3.3.0?

@dongjoon-hyun
Copy link
Member

To mitigate the changes, I made a PR. This PR's contribution is still valid. Only the deprecation is removed and the doc is recovered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants