Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark.comet.memory.overhead.min not respected when submitting jobs with Comet with Spark on Kubernetes #605

Closed
lmouhib opened this issue Jun 28, 2024 · 9 comments · Fixed by #836
Assignees
Labels
bug Something isn't working

Comments

@lmouhib
Copy link

lmouhib commented Jun 28, 2024

Describe the bug

Currently when submitting a job on kubernetes, the total memory of the driver or executor is the sum of the memory defined in the spark configuration and the overhead (spark.{executor|driver}.memoryOverhead+ spark.{executor|driver}.memory). This does not included the default values defined by spark.comet.memory.overhead.factor.

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

@lmouhib lmouhib added the bug Something isn't working label Jun 28, 2024
@comphead
Copy link
Contributor

Thats true, thanks for raising this. Spark Kubernetes pods respect Spark params but has no any idea about Comet params for now. Once pod memory allocated its not possible to change it. Since Comet has no control over Spark params, I'm inclining into including spark.comet.memory.overhead.factor into spark memoryOverhead but open to other propositions

@lmouhib
Copy link
Author

lmouhib commented Jun 28, 2024

I am inclined to including it to the spark overhead, however, I am not sure to understand why its not taken into consideration here. That line of code should provide the new value of the memory overhead, which is then passed to spark and the RM to create the pod template that gets applied to create the pod.

@comphead
Copy link
Contributor

I think what might happen is the executor pod already started by the time Comet tries to tweak the memory and once the pod is up, its not possible to change the allocated memory size. What Comet does is correct, but I have some feeling it should have done earlier

@viirya
Copy link
Member

viirya commented Jun 29, 2024

If CometDriverPlugin is loaded, it should overwrite executor overhead by existing executor overhead + Comet overhead.

@lmouhib
Copy link
Author

lmouhib commented Jun 30, 2024

We need to document the CometDriverPlugin, its not something that is mentioned int he configurations or any page of the documentation.

I did set the spark.plugins to org.apache.spark.CometPlugin, and still did not observe any change in the executor container memory configuarion.

@comphead
Copy link
Contributor

comphead commented Jul 1, 2024

I'll run some tests on this today

@lmouhib
Copy link
Author

lmouhib commented Jul 1, 2024

How is the documentation for the configuration generated? The only mention there is of the plugin is in the code base itself. I can open a PR to add the conf, if its done in the documentation itself and not auto generated.

@comphead
Copy link
Contributor

comphead commented Jul 8, 2024

Depends on #643

@comphead
Copy link
Contributor

Depends on #689

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants