Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove sortedcontainers #7245

Closed
wants to merge 3 commits into from
Closed

Conversation

gjoseph92
Copy link
Collaborator

This makes Scheduler.workers and Scheduler.idle no longer sorted. Sorted containers are a bit slower than their plain equivalents. Most importantly, iterating over them (which we do in plenty of places) is O(nlogn) instead of O(n).

In practice, actual performance is probably worse; I found iteration to be 10x slower in microbenchmarks: #4925 (comment).

This would only be done after #7221. The only code relying on these collections being sorted is the old no-deps decide_worker logic, which would be unreachable after that PR.

I took this to the extreme and entirely removed the sortedcontainers dependency. If we wanted to be more conservative, we could only do 2b8ebf1 (I think we should absolutely do this at least if we merge #7221).

@fjetter @crusaderky

  • Tests added / passed
  • Passes pre-commit run --all-files

Not sure if it's necessary to actually sort anywhere
`Computation.code` was the only other place it was used. Doesn't seem worth the dependency for that.
@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±0         15 suites  ±0   6h 25m 56s ⏱️ - 2m 53s
  3 168 tests ±0    3 081 ✔️  -   2    84 💤 +1    3 +1 
23 440 runs  ±0  22 526 ✔️  - 10  902 💤 +2  12 +8 

For more details on these failures, see this check.

Results for commit aa0cde0. ± Comparison against base commit 02b9430.

Comment on lines -2183 to -2204
worker_pool = self.idle or self.workers
# FIXME idle and workers are SortedDict's declared as dicts
# because sortedcontainers is not annotated
wp_vals = cast("Sequence[WorkerState]", worker_pool.values())
n_workers: int = len(wp_vals)
if n_workers < 20: # smart but linear in small case
ws = min(wp_vals, key=operator.attrgetter("occupancy"))
assert ws
if ws.occupancy == 0:
# special case to use round-robin; linear search
# for next worker with zero occupancy (or just
# land back where we started).
wp_i: WorkerState
start: int = self.n_tasks % n_workers
i: int
for i in range(n_workers):
wp_i = wp_vals[(i + start) % n_workers]
if wp_i.occupancy == 0:
ws = wp_i
break
else: # dumb but fast in large case
ws = wp_vals[self.n_tasks % n_workers]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW if it's just about this code path, we can drop the sortedset entirely by putting workers in a list as well as a set. that'd be a LogN operation whenever a worker leaves the cluster to remove that worker from the list. That's much better than a NlogN every time we iterate over the workers

@crusaderky
Copy link
Collaborator

There are non-trivial merge conflicts - could you resolve them?

@gjoseph92
Copy link
Collaborator Author

This depends on #7221, which it doesn't seem is happening any time soon.

#7280 is kinda similar to that, without changing the rootish definition. I think the order we'd merge things is

  1. Consistent worker selection for no-deps cases #7280
  2. All tasks without dependencies are root-ish #7221
  3. this PR

So I'll close this for now, will re-open if we're going to work on this.

@gjoseph92 gjoseph92 closed this Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants