Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update heartbeat checks in scheduler #3896

Merged
merged 2 commits into from
Jun 18, 2020
Merged

Conversation

quasiben
Copy link
Member

attempt to fix #3877

@quasiben
Copy link
Member Author

I was a little confused by this statement:

I recommend modifying the distributed/scheduler.py::heartbeat_interval
function to remove the maximum

For now, I increased the maximum heartbeat to 10s

@mrocklin
Copy link
Member

For now, I increased the maximum heartbeat to 10s

At the 20k worker level even this results in heartbeats every 500us, which might still be a burden on the scheduler. I recommend that we drop the maximum interval entirely.

@quasiben
Copy link
Member Author

At the 20k worker level even this results in heartbeats every 500us, which might still be a burden on the scheduler. I recommend that we drop the maximum interval entirely.

Sorry for being dense here but if we removed the maximum what should we be doing for messages like the following when len(workers)>200

return {
            "status": "OK",
            "time": time(),
            "heartbeat-interval": heartbeat_interval(len(self.workers)),
        }

@mrocklin
Copy link
Member

mrocklin commented Jun 15, 2020 via email

@mrocklin mrocklin merged commit 920af0f into dask:master Jun 18, 2020
@@ -5303,7 +5303,9 @@ def _reevaluate_occupancy_worker(self, ws):
async def check_worker_ttl(self):
now = time()
for ws in self.workers.values():
if ws.last_seen < now - self.worker_ttl:
if (ws.last_seen < now - self.worker_ttl) and (
10 * heartbeat_interval(len(self.workers))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terribly confused about this change. Doesn't 10 * heartbeat_interval() always evaluate to true since heartbeat_interval is strictly positive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you're right. i messed up the logic here.

@quasiben quasiben deleted the fix/heartbeat branch June 30, 2020 20:47
@garanews
Copy link

Is heartbeat_interval something in the future can be set in config like other Parameters?

@mrocklin
Copy link
Member

@garanews this PR has been closed. I encourage you to raise a new issue at https://github.com/dask/distributed/issues/new

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make heartbeat easier to configure
4 participants