control de/ser offload #3793

martindurant · 2020-05-12T14:24:13Z

(posting some scratch code here, to pick up later)

martindurant · 2020-05-13T18:45:13Z

@mrocklin , how would I benchmark this to show this makes any difference?

mrocklin · 2020-05-14T02:30:51Z

If you follow the link from the original issue it points to a comment by @Kobzol . I recommend asking him.

Kobzol · 2020-05-14T06:31:36Z

@martindurant I can run a series of benchmarks with and without this change if you want. However, in my previous benchmarks it was the offloading in to_frames, not from_frames, that was showing up as 10-15 % of runtime in profiles.

martindurant · 2020-05-14T13:44:26Z

@Kobzol , oh!
I can do the same things at both ends, and then I'll ping you to see if it was helpful, thanks.

martindurant · 2020-05-14T19:28:27Z

@Kobzol

Kobzol · 2020-05-18T13:19:24Z

I ran a few benchmarks. sizeof has mostly disappeared from the trace, however not completely, because it's used in more places in the scheduler, while you have disabled it only for the main RPC connection. The scheduler uses multiple RPC connections (for example for clients downloading data), so I think that it should be disabled for all RPC connections originating on the scheduler. I'll try to run the benchmark again with allow_offload=False everywhere.

From my current results I have not seen any perf. improvements at all, so it's possible that this change will not have a large effect on the overall pipeline runtime. This is something that we have observed quite often with out Rust scheduler implementation, sometimes even a big (local) win does not translate to faster pipelines if it's not in the overall hot path. But even if we found no perf. gains I still think that this change should be merged, as it's just wasteful to calculate the size on the scheduler.

Hopefully in the future there will be a standardized benchmark suite that will allow us to test these changes easily.

Kobzol · 2020-05-18T14:39:10Z

FWIW, I got some wins when I turned off offloading everywhere (even on clients and workers):

I would be interested if these perf. gains can be replicated if offloading is only disabled on the scheduler (but on all its comms).

martindurant · 2020-05-20T15:55:02Z

Which comms are not avoiding offload on the scheduler?

Kobzol · 2020-05-20T16:17:23Z

The scheduler has a rpc attribute, which is a connection pool that handles out-of-band messages like gather. But it also creates separate connections for each connected client and worker. These connections handle streaming messages (compute task, task finished, etc.). It's the difference between self.handlers and worker_handlers/client_handlers in the scheduler.

When I inspect the comm variable in Scheduler::add_worker, it has allow_offload=True. And when I add print(allow_offload) to the to_frames method, it prints mostly False in the scheduler process. That's because most of the messages do not actually go through the Scheduler's rpc comm. But I might be wrong :)

mrocklin · 2020-05-20T16:35:33Z

Almost all messages into or out of a Dask Server arrive in two places

Server.rpc, which is all of the connections that this server initiates
Server.listener, which is created in core.py::Server.listen which handles all of the inbound connections

There are a few other places from time to time, but have tried hard to centralize connection handling on these two locations.

distributed/core.py

mrocklin · 2020-05-22T01:48:35Z

The changes here seem sensible to me. It feels awkward to modify an object these days, rather than pass in a parameter, but I suspect that this is nicer than threading the allow_offload parameter through superclasses, which makes sense.

I'm +1. It would be good to get feedback from @Kobzol first though if he has time to run this through benchmarks again.

Kobzol · 2020-05-22T07:33:38Z

I ran the benchmarks again and I got the same perf. gains when the offloading was only disabled at the scheduler. Around 10-20% overall runtime improvement for my numpy and pandas benchmarks.

Seems to me like a net win, as it was showing up in profiles before on a very hot path and now it's completely gone.

martindurant · 2020-05-22T13:06:22Z

@quasiben , do you want to give this a quick once-over, to see if this might affect UCX at all? (I don't think so)

martindurant · 2020-05-25T17:32:34Z

OK merging.

I would like to mention that distributed.utils.nbytes is needlessly called multiple times an a couple of places. That could be eliminated at the cost of slightly longer code, but I won't do it now until it is shown to affect performance; the function itself is very short and simple.

quasiben · 2020-05-26T14:05:47Z

Sorry for missing this @martindurant. FWIW, the changes should not the affect UCX comms

mrocklin · 2020-05-26T14:23:56Z

Thanks for leading this @martindurant !

I would like to mention that distributed.utils.nbytes is needlessly called multiple times an a couple of places. That could be eliminated at the cost of slightly longer code, but I won't do it now until it is shown to affect performance; the function itself is very short and simple.

This should be fine I think. The nbytes function is much faster than the sizeof function. nbytes is only called on bytes/memoryview/bytearray type objects.

start

a06cb6f

Apply to to_frames too

560bd61

martindurant changed the title ~~WIP: control deser offload~~ WIP: control de/ser offload May 14, 2020

Kobzol mentioned this pull request May 19, 2020

[Discussion] Client/Scheduler Performance #3783

Open

13 tasks

Apply to listeners (TCP, UCX)

e3a4ba0

Kobzol reviewed May 21, 2020

View reviewed changes

distributed/core.py Outdated Show resolved Hide resolved

Martin Durant added 3 commits May 21, 2020 10:14

listen arg for scheduler only

b7eb24e

black

1901932

Merge branch 'master' into not_offload

36f3439

martindurant changed the title ~~WIP: control de/ser offload~~ control de/ser offload May 22, 2020

martindurant merged commit 0ec78f8 into dask:master May 25, 2020

martindurant deleted the not_offload branch May 25, 2020 17:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

control de/ser offload #3793

control de/ser offload #3793

martindurant commented May 12, 2020 •

edited

Loading

martindurant commented May 13, 2020

mrocklin commented May 14, 2020

Kobzol commented May 14, 2020

martindurant commented May 14, 2020

martindurant commented May 14, 2020

Kobzol commented May 18, 2020

Kobzol commented May 18, 2020

martindurant commented May 20, 2020

Kobzol commented May 20, 2020

mrocklin commented May 20, 2020

mrocklin commented May 22, 2020

Kobzol commented May 22, 2020

martindurant commented May 22, 2020

martindurant commented May 25, 2020

quasiben commented May 26, 2020

mrocklin commented May 26, 2020

control de/ser offload #3793

control de/ser offload #3793

Conversation

martindurant commented May 12, 2020 • edited Loading

martindurant commented May 13, 2020

mrocklin commented May 14, 2020

Kobzol commented May 14, 2020

martindurant commented May 14, 2020

martindurant commented May 14, 2020

Kobzol commented May 18, 2020

Kobzol commented May 18, 2020

martindurant commented May 20, 2020

Kobzol commented May 20, 2020

mrocklin commented May 20, 2020

mrocklin commented May 22, 2020

Kobzol commented May 22, 2020

martindurant commented May 22, 2020

martindurant commented May 25, 2020

quasiben commented May 26, 2020

mrocklin commented May 26, 2020

martindurant commented May 12, 2020 •

edited

Loading