Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't send empty dependencies #3423

Merged
merged 3 commits into from
May 23, 2020
Merged

Don't send empty dependencies #3423

merged 3 commits into from
May 23, 2020

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented Jan 29, 2020

When a client sends the update-graph message, it sends a dictionary with dependencies of individual tasks. Tasks without dependencies needlessly occupy bandwidth in this message, since the scheduler anyway does something like dependencies.get(task, ()) when it parses the message (if I'm wrong please let me know).

For example for this simple merge with a lot of tasks without dependencies:

from dask import delayed
from dask.distributed import Client

client = Client("tcp://localhost:8786")

@delayed
def do_something(x):
    return x * 10

@delayed
def merge(*args):
    return sum(args)

xs = [do_something(x) for x in range(1000)]
result = merge(*xs).compute()

This PR changes the dependencies in the update-graph message from this:

{
    "merge-3cfcfa4f-fa40-45ba-b181-54eb7554377f": ('do_something-ae852807-355c-4d87-85e4-72e6f1d0a146', 'do_something-3e003b98-363b-42ab-b270-691db098f222', ...),
    "do_something-9f9820f5-fbe9-4d31-ba9d-ac17996e4c35": (),
    "do_something-34c2c63b-2d3d-4e5c-8fef-f94ab635e19f": (),
    ... # 998 additional lines
}

into this:

{
    "merge-3cfcfa4f-fa40-45ba-b181-54eb7554377f": ('do_something-ae852807-355c-4d87-85e4-72e6f1d0a146', 'do_something-3e003b98-363b-42ab-b270-691db098f222', ...),
}

For the aforementioned graph, it reduces the update-graph total frame byte size by the following amounts:

Merge task # Original size Reduced size % of original
5000 4170630 B 3910628 B 94 %
10000 8340666 B 7820664 B 94 %
15000 12510702 B 11730700 B 94 %
20000 16680738 B 15640736 B 94 %
25000 20850774 B 19550772 B 94 %

@Kobzol Kobzol requested a review from mrocklin January 29, 2020 10:00
@mrocklin
Copy link
Member

Thank you for the pull request @Kobzol . I'm curious, was this work motivated by something in particular? The performance changes here are small, so I'm trying to decide if adding this change is worth the added complexity. Was there a motivating application?

@Kobzol
Copy link
Contributor Author

Kobzol commented Jan 30, 2020

While working on the Rust scheduler, I noticed that there is a lot of empty dependencies being sent, which seemed inefficient to me. I wouldn't send this PR if it was a complicated change, but since it's a one liner, I thought that it's a nice improvement for such a small change.

I later had to modify the Graphlayout to work with tasks instead of dependencies, but even that doesn't seem like a big change.

@mrocklin mrocklin merged commit 3c7b5b1 into dask:master May 23, 2020
@mrocklin
Copy link
Member

Thanks @Kobzol ! Sorry for the delay

@Kobzol Kobzol deleted the dependencies branch May 23, 2020 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants