-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update heartbeat checks in scheduler #3896
Conversation
I was a little confused by this statement:
For now, I increased the maximum heartbeat to 10s |
At the 20k worker level even this results in heartbeats every 500us, which might still be a burden on the scheduler. I recommend that we drop the maximum interval entirely. |
Sorry for being dense here but if we removed the maximum what should we be doing for messages like the following when len(workers)>200
|
Lets say that we want no more than 200 heartbeats a second or something
like that, and adjust the heartbeat interval accordingly to match that
based on the number of workers.
…On Mon, Jun 15, 2020 at 8:46 AM Benjamin Zaitlen ***@***.***> wrote:
At the 20k worker level even this results in heartbeats every 500us, which
might still be a burden on the scheduler. I recommend that we drop the
maximum interval entirely.
Sorry for being dense here but if we removed the maximum what should we be
doing for messages like the following when len(workers)>200
return {
"status": "OK",
"time": time(),
"heartbeat-interval": heartbeat_interval(len(self.workers)),
}
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3896 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTCA7BLTFDB2Y37DFNTRWY64ZANCNFSM4N6IQKUQ>
.
|
@@ -5303,7 +5303,9 @@ def _reevaluate_occupancy_worker(self, ws): | |||
async def check_worker_ttl(self): | |||
now = time() | |||
for ws in self.workers.values(): | |||
if ws.last_seen < now - self.worker_ttl: | |||
if (ws.last_seen < now - self.worker_ttl) and ( | |||
10 * heartbeat_interval(len(self.workers)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terribly confused about this change. Doesn't 10 * heartbeat_interval()
always evaluate to true since heartbeat_interval
is strictly positive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you're right. i messed up the logic here.
Is heartbeat_interval something in the future can be set in config like other Parameters? |
@garanews this PR has been closed. I encourage you to raise a new issue at https://github.com/dask/distributed/issues/new |
attempt to fix #3877