Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: Document CometPlugin to start Comet in cluster mode #836

Merged
merged 6 commits into from
Aug 19, 2024

Conversation

comphead
Copy link
Contributor

Which issue does this PR close?

Closes #826
Closes #605

Rationale for this change

Document the way to start Comet in cluster mode so the resource manager respects Comet memory configuration parameters

What changes are included in this PR?

How are these changes tested?

@comphead comphead added the documentation Improvements or additions to documentation label Aug 16, 2024
@comphead comphead changed the title Documenting CometPlugin doc: Document CometPlugin to start Comet in cluster mode Aug 16, 2024
@@ -37,6 +37,15 @@ Comet will allocate at least `spark.comet.memory.overhead.min` memory.

If both `spark.comet.memoryOverhead` and `spark.comet.memory.overhead.factor` are set, the former will be used.

## Memory Tuning in clusters
In clusters like Kubernetes or YARN to make the resource manager respect correctly Comet memory parameters `spark.comet.memory*`
it is needed to pass to the starting command line additional Spark configuration parameter `--conf spark.plugins=org.apache.spark.CometPlugin`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why we don't specify the plugin in all cases instead of --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions since the plugin does (or can) register the session extensions?

Other Spark accelerators specify a plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a great question. Its not needed for local run, but Spark in enterprises get started mostly in clusters. Perhaps we can also update the installation guide and make a remark about the plugin if people run the Comet in cluster mode

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is mostly because we develop the extension at the beginning of Comet project. So all documents and customs are to specify the extension. We develop the Comet plugin after a while when we want to have it to configure memory easily.

Comment on lines 156 to 158
### Cluster mode
Running in cluster mode it might be needed to set addtitional `--conf spark.plugins=org.apache.spark.CometPlugin`
to make the cluster resource managares respect Comet memory parameters. More [Memory Tuning in cluster mode](./tuning.md#memory-tuning-in-cluster-mode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not specified for cluster usage, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not specified for cluster usage, I think.

sorry, I dont follow. Does the comment require any fix? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I don't think CometPlugin is bound to cluster mode. We don't need this section probably.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we need mention it on the installation page, as this is the first page user interacts with and they interested in this topic. Perhaps we can rephrase it somehow

comphead and others added 3 commits August 16, 2024 13:44
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
@comphead
Copy link
Contributor Author

@andygrove @viirya please have a second look?

@comphead comphead merged commit 27ab86b into apache:main Aug 19, 2024
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
* Documenting CometPlugin

Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit 27ab86b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
3 participants