Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-47383][CORE] Support
spark.shutdown.timeout
config
### What changes were proposed in this pull request? Make the shutdown hook timeout configurable. If this is not defined it falls back to the existing behavior, which uses a default timeout of 30 seconds, or whatever is defined in core-site.xml for the hadoop.service.shutdown.timeout property. ### Why are the changes needed? Spark sometimes times out during the shutdown process. This can result in data left in the queues to be dropped and causes metadata loss (e.g. event logs, anything written by custom listeners). This is not easily configurable before this change. The underlying `org.apache.hadoop.util.ShutdownHookManager` has a default timeout of 30 seconds. It can be configured by setting hadoop.service.shutdown.timeout, but this must be done in the core-site.xml/core-default.xml because a new hadoop conf object is created and there is no opportunity to modify it. ### Does this PR introduce _any_ user-facing change? Yes, a new config `spark.shutdown.timeout` is added. ### How was this patch tested? Manual testing in spark-shell. This behavior is not practical to write a unit test for. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#45504 from robreeves/sc_shutdown_timeout. Authored-by: Rob Reeves <roreeves@linkedin.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
- Loading branch information