-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5411] Allow SparkListeners to be specified in SparkConf and loaded when creating SparkContext #4111
Conversation
/cc @ksakellis, who originally requested this feature. |
Test build #25787 has finished for PR 4111 at commit
|
@andrewor14, could you take a quick pass through this to see if it seems reasonable? |
Hey so for this one, I think it might be good to use a configuration mechanism rather than exposing a different constructor. The issue with this approach is that users have to re-write their code to use a different constructor in order to use this, whereas many applications might be written using the standard SparkContext constructor. What about having a configuration option that allows you to specify classes of additional listeners, and then we expect the listener has a constructor that accepts a SparkConf.
Then when we construct a SparkContext we instantiate these and with the SparkConf and we attach them to the context. |
@pwendell I think using configuration to add spark listeners adds complexity to the system.
@vanzin had the idea, which I like, of creating a SparkContextBuilider object that would make construction easier than adding more constructor arguments. So you can imagine something like: val context = SparkContextBuilder.addConfig(confg)
.addListeners(...)
.setSparkHome(..)
.build() We can obviously keep supporting the existing constructors for compatibility but move towards this for new constructor arguments. Just an Idea. |
Whether we use a builder pattern vs. adding more constructors is slightly orthogonal to what I was saying. I was saying that it would be good if there was some way to avoid making the user who instantiates a SparkContext have to explicitly identify listeners in compiled code. It would be more flexible if e.g. a packager could add listeners that will always be attached in a specific environment. Since listeners are a semi-public integration point, this seems reasonable to have a hook here. Having builder-style constructors for SparkContext is a separate proposal. In the past we've tried to put as much of the logic as possible into configurations and generally avoid having complex constructors. The thinking was it's not good to have two disparate ways of doing configuration (what happens if I have spark.home in a config and then I manually set sparkHome, etc, can be confusing). That's something we could look at separately, though, if we want to go away from the current approach. |
Test build #25854 has finished for PR 4111 at commit
|
I've updated this patch to sketch out a proposal for SparkConf-based listener configuration, as Patrick suggested. Please take another look and note the updated PR description. |
@@ -379,6 +379,41 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli | |||
} | |||
executorAllocationManager.foreach(_.start()) | |||
|
|||
// Use reflection to instantiate listeners specified via the `spark.extraListeners` configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we group this logic with listenerBus.start() into a separate method called "setupListenerBus()" and call it in the constructor? The reason is this logic has to occur before the listenerBus.start() and having them grouped in a method can better ensure the order is preserved.
Test build #26073 has finished for PR 4111 at commit
|
I've updated this PR to remove the SPARK_EXTRA_LISTENERS variable. I've also edited the PR description and title to link to a new JIRA, https://issues.apache.org/jira/browse/SPARK-5411, which describes the new mechanism added in this PR, rather than https://issues.apache.org/jira/browse/SPARK-5190, which should be addressed via a different mechanism (either another overloaded SparkContext constructor or a SparkContext builder). |
Test build #26117 has finished for PR 4111 at commit
|
Jenkins, retest this please. |
Test build #26382 has finished for PR 4111 at commit
|
…-sparklistener-in-sc-constructor Conflicts: core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala core/src/main/scala/org/apache/spark/scheduler/SparkListenerBus.scala
Test build #26520 has finished for PR 4111 at commit
|
I'm going to merge this in a little bit unless there's any additional feedback. |
LGTM |
…aded when creating SparkContext This patch introduces a new configuration option, `spark.extraListeners`, that allows SparkListeners to be specified in SparkConf and registered before the SparkContext is initialized. From the configuration documentation: > A comma-separated list of classes that implement SparkListener; when initializing SparkContext, instances of these classes will be created and registered with Spark's listener bus. If a class has a single-argument constructor that accepts a SparkConf, that constructor will be called; otherwise, a zero-argument constructor will be called. If no valid constructor can be found, the SparkContext creation will fail with an exception. This motivation for this patch is to allow monitoring code to be easily injected into existing Spark programs without having to modify those programs' code. Author: Josh Rosen <joshrosen@databricks.com> Closes #4111 from JoshRosen/SPARK-5190-register-sparklistener-in-sc-constructor and squashes the following commits: 8370839 [Josh Rosen] Two minor fixes after merging with master 6e0122c [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-5190-register-sparklistener-in-sc-constructor 1a5b9a0 [Josh Rosen] Remove SPARK_EXTRA_LISTENERS environment variable. 2daff9b [Josh Rosen] Add a couple of explanatory comments for SPARK_EXTRA_LISTENERS. b9973da [Josh Rosen] Add test to ensure that conf and env var settings are merged, not overriden. d6f3113 [Josh Rosen] Use getConstructors() instead of try-catch to find right constructor. d0d276d [Josh Rosen] Move code into setupAndStartListenerBus() method b22b379 [Josh Rosen] Instantiate SparkListeners from classes listed in configurations. 9c0d8f1 [Josh Rosen] Revert "[SPARK-5190] Allow SparkListeners to be registered before SparkContext starts." 217ecc0 [Josh Rosen] Revert "Add addSparkListener to JavaSparkContext" 25988f3 [Josh Rosen] Add addSparkListener to JavaSparkContext 163ba19 [Josh Rosen] [SPARK-5190] Allow SparkListeners to be registered before SparkContext starts. (cherry picked from commit 9a7ce70) Signed-off-by: Patrick Wendell <patrick@databricks.com>
This patch introduces a new configuration option,
spark.extraListeners
, that allows SparkListeners to be specified in SparkConf and registered before the SparkContext is initialized. From the configuration documentation:This motivation for this patch is to allow monitoring code to be easily injected into existing Spark programs without having to modify those programs' code.