You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When submitting an application with in sparkConf a value for spark.executor.pyspark.memory, this memory is added to the request of the executor, but it's not included in the yunikorn.apache.org/task-groups annotation. This makes the executor being stuck in Pending and Yunikorn reporting that the request for the pod is bigger than what the placeholder reserved.
✋ I have searched the open/closed issues and my issue is not listed.
Where the resources.requests.memory is a sum of memory, memoryOverhead and spark.executor.pyspark.memory.
This results in pods that never get scheduled.
Terminal Output Screenshot(s)
Pod events of an unschedulable executor:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduling 7m22s yunikorn default/pythonpi-ce8d8891ff1c8bd7-exec-1 is queued and waiting for allocation
Normal GangScheduling 7m22s yunikorn Pod belongs to the taskGroup spark-executor, it will be scheduled as a gang member
Example log in Yunikorn:
2024-09-17T08:32:29.119Z WARN core.scheduler.application objects/application.go:1130 releasing placeholder: real allocation is larger than placeholder {"requested resource": "map[memory:7650410496 pods:1 vcore:1000]", "placeholderID": "b978035d-e27c-4e2e-b3bf-4cd5f10b6fdb-0", "placeholder resource": "map[memory:5553258496 pods:1 vcore:1000]"}
@tcassaert Thanks for reporting the bug. @jacobsalway Could you take a look at this issue? I think we have missed this spark.executor.pyspark.memory conf when calculating the memory needed for yunikorn task group.
I can replicate the issue locally on Kind with the provided instructions, thanks. Looks like this file in apache/spark contains the logic you've flagged. I'll put together a fix for this.
Description
When submitting an application with in
sparkConf
a value forspark.executor.pyspark.memory
, this memory is added to therequest
of the executor, but it's not included in theyunikorn.apache.org/task-groups
annotation. This makes the executor being stuck inPending
and Yunikorn reporting that the request for the pod is bigger than what the placeholder reserved.Reproduction Code [Required]
Steps to reproduce the behavior:
Submit following
SparkApplication
:Expected behavior
The task-group for the executors should have included the
spark.executor.pyspark.memory
in theminResources.memory
.Actual behavior
The task-group looks like this:
Where
minResource.memory
is a sum ofmemory
andmemoryOverhead
.The request of the executor pods is however:
Where the
resources.requests.memory
is a sum ofmemory
,memoryOverhead
andspark.executor.pyspark.memory
.This results in pods that never get scheduled.
Terminal Output Screenshot(s)
Pod events of an unschedulable executor:
Example log in Yunikorn:
Environment & Versions
Spark Operator App version: v2.0.0-rc.0
Helm Chart Version: v2.0.0-rc.0
Kubernetes Version: 1.25.7
Apache Spark version: 3.5.2
The text was updated successfully, but these errors were encountered: