Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Horovod on Spark auto-scale #1956

Merged
merged 10 commits into from
Jun 30, 2020

Conversation

EnricoMi
Copy link
Collaborator

@EnricoMi EnricoMi commented May 16, 2020

Blocked by #2063.

@EnricoMi EnricoMi force-pushed the branch-elastic-spark-auto-scale branch 3 times, most recently from 171c508 to 3910088 Compare May 21, 2020 10:09
@EnricoMi EnricoMi force-pushed the branch-elastic-spark-auto-scale branch from 3910088 to 9e791b7 Compare May 27, 2020 07:31
@EnricoMi EnricoMi force-pushed the branch-elastic-spark-auto-scale branch 2 times, most recently from 7e544be to 45c8fd0 Compare June 14, 2020 10:48
@EnricoMi EnricoMi changed the title [WIP] Elastic Horovod on Spark auto-scale Elastic Horovod on Spark auto-scale Jun 14, 2020
Copy link
Collaborator

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! To unblock this for now, let's skip the failing tests. We can put together a follow-up PR to determine the root cause and fix them.

@@ -194,14 +194,31 @@ def wait_for_initial_registration(self, timeout):
self._wait_cond.release()

def wait_for_command_start(self, timeout=None):
"""Waits for a command to start.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overloaded behavior is a little tricky to follow. I would consider breaking this out into two methods: one that raises an exception and takes a Timeout object, and another that takes a numerical value and returns the state at the end of the timeout.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this is horrible. I have split it up into two methods. Thanks!

Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
…sting

Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Enrico Minack <github@enrico.minack.dev>
@EnricoMi EnricoMi force-pushed the branch-elastic-spark-auto-scale branch from 448501f to f69a110 Compare June 29, 2020 18:34
@EnricoMi EnricoMi merged commit 271c52e into horovod:master Jun 30, 2020
@EnricoMi EnricoMi deleted the branch-elastic-spark-auto-scale branch June 30, 2020 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants