Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskGroup.nbytes_in_memory miscounted for replicated keys #4927

Closed
gjoseph92 opened this issue Jun 18, 2021 · 3 comments
Closed

TaskGroup.nbytes_in_memory miscounted for replicated keys #4927

gjoseph92 opened this issue Jun 18, 2021 · 3 comments

Comments

@gjoseph92
Copy link
Collaborator

I think there is a logic error with bookkeping for TaskGroup.nbytes_in_memory. There's a discrepancy between how we increment it and decrement it when multiple workers hold the same key.

In transition_memory_released, we decrement it by nbytes once for every worker that holds that task:

for ws in ts._who_has:
del ws._has_what[ts]
ws._nbytes -= ts_nbytes
ts._group._nbytes_in_memory -= ts_nbytes

Whereas in _propagate_forgotten, we decrement it once by nbytes if there are any workers holding the task, regardless of how many. This doesn't match with transition_memory_released:
ts_nbytes: Py_ssize_t = ts.get_nbytes()
if ts._who_has:
ts._group._nbytes_in_memory -= ts_nbytes

On the creation side, in TaskState.set_nbytes, we only increment it by the diff between the last known value and the current value. If the key is being copied to multiple workers, this difference is usually 0:

def set_nbytes(self, nbytes: Py_ssize_t):
diff: Py_ssize_t = nbytes
old_nbytes: Py_ssize_t = self._nbytes
if old_nbytes >= 0:
diff -= old_nbytes
self._group._nbytes_total += diff
self._group._nbytes_in_memory += diff

In short, I think TaskGroup.nbytes_in_memory is incremented once per key, but decremented once per copy of the key.

If nbytes can be different for different workers, then to do this bookkeeping correctly, I think we'd also need to track TaskState.total_nbytes (size of all copies of the key), then decrement by that once in transition_memory_released and _propagate_forgotten.

Discovered in #4925 (comment). I think #4925 made this more apparent, since it encourages more data replication.

cc @crusaderky since you know more about replicated keys.

@fjetter
Copy link
Member

fjetter commented Jun 18, 2021

I stumbled over this myself recently, see

# TODO: Are we supposed to track replicated memory here? See also Scheduler.add_keys
assert tg.nbytes_in_memory == y.nbytes
where this behaviour is intentionally pinned but I believe this was by mistake. Intuitively, I would also expect this to be different.

From what I can see, the lack of counting is introduced in Scheduler.add_keys where workers let the scheduler know once they have a replica. There, the group nbytes is not increased

if ts not in ws._has_what:
ws._nbytes += ts.get_nbytes()
ws._has_what[ts] = None
ts._who_has.add(ws)

I noticed this in my deadlock PR which already grew without bounds so I didn't fix it.

@mrocklin
Copy link
Member

Maybe this will solve the problem? #4930

@jrbourbeau
Copy link
Member

Closed via #4930

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants