Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky test_dont_steal_long_running_tasks #6197

Merged
merged 1 commit into from
Apr 26, 2022

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Apr 25, 2022

https://github.com/dask/distributed/runs/6159504580?check_suite_focus=true

        actual_total_occupancy = 0
        for worker, ws in self.workers.items():
>           assert abs(sum(ws.processing.values()) - ws.occupancy) < 1e-8
E           AssertionError

long-running tasks are in WorkerState.processing, but don't contribute to WorkerState.occupancy

See

old = ws.processing.get(ts, 0)
ws.processing[ts] = total_duration
if ts not in ws.long_running:
self.total_occupancy += total_duration - old
ws.occupancy += total_duration - old
return total_duration

and
occ: float = ws.processing[ts]
ws.occupancy -= occ
self.total_occupancy -= occ
# Cannot remove from processing since we're using this for things like
# idleness detection. Idle workers are typically targeted for
# downscaling but we should not downscale workers with long running
# tasks
ws.processing[ts] = 0
ws.long_running.add(ts)
self.check_idle_saturated(ws)

@crusaderky crusaderky self-assigned this Apr 25, 2022
@crusaderky crusaderky added the flaky test Intermittent failures on CI. label Apr 25, 2022
@github-actions
Copy link
Contributor

Unit Test Results

       16 files  ±0         16 suites  ±0   7h 31m 31s ⏱️ + 13m 21s
  2 731 tests ±0    2 648 ✔️  - 1       82 💤 ±0  1 +1 
21 733 runs  ±0  20 684 ✔️  - 3  1 048 💤 +2  1 +1 

For more details on these failures, see this check.

Results for commit 68ca3bb. ± Comparison against base commit 198522b.

@fjetter
Copy link
Member

fjetter commented Apr 26, 2022

Failing test is about a leaking thread, see also #5275

Appears to happen very rarely

@fjetter fjetter merged commit bd910f0 into dask:main Apr 26, 2022
@crusaderky crusaderky deleted the long_running_occupancy branch April 26, 2022 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test Intermittent failures on CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants