Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: adapt to new Task spec in dask, now used in blockwise #556

Merged
merged 29 commits into from
Dec 16, 2024
Merged

Conversation

lgray
Copy link
Collaborator

@lgray lgray commented Dec 4, 2024

Adapts one bit of dask_awkward code that makes a graph to use a task object instead.

c.f.:
dask/dask#11568

@martindurant
Copy link
Collaborator

I asked on the dask PR whether there are specific migration instructions associated with their change.

@lgray
Copy link
Collaborator Author

lgray commented Dec 4, 2024

@martindurant rewrite_layer_chains needs to be adjusted, along with various checking that's happening during the tests. The latter is all localized in one function for the most part so hopefully a reasonable fix.

There must be some way to just pop out a good old dictionary from the Task class.

@lgray
Copy link
Collaborator Author

lgray commented Dec 4, 2024

We will likely need to put a dask >= 2024.12.0 requirement on the next release, and a dask < 2024.12.0 for python < 3.10.

@pfackeldey
Copy link
Collaborator

pfackeldey commented Dec 4, 2024

If I'm understanding this correctly a Task allows to .substitute dependencies? This could be useful for graph cloning and then replacing the IO layer - or am I misunderstanding this?

@martindurant
Copy link
Collaborator

Task allows to .substitute itself

Some docs on that method would be very useful.

@martindurant
Copy link
Collaborator

appease mypy's insatiable lust for perfect correctness

:)

@martindurant
Copy link
Collaborator

rewrite_layer_chains needs to be adjusted, along with various checking that's happening during the tests

Is this something I'll need to get on?

@lgray
Copy link
Collaborator Author

lgray commented Dec 4, 2024

@martindurant seems like it - but it also looks like rewrite_layer_chains get significantly more easy to read using the new fuse/substitute interfaces in Task.

@martindurant
Copy link
Collaborator

if I skip the cull step everything computes as expected

So the graph is right, the dependency must be there, else it wouldn't compute - but cull is making some other assumption that we don't meet.

@lgray
Copy link
Collaborator Author

lgray commented Dec 7, 2024

Indeed, something has definitely changed:
image

Even though the blockwise_optimized layers claim the Delayed as a dependency (!!) cull drops the Delayed.

@lgray
Copy link
Collaborator Author

lgray commented Dec 7, 2024

I have found the source of the difference.

After dask/dask#11568 the delayed array no longer shows up as a constant dependency in the task graph coming from this loop:
https://github.com/dask/dask/blob/main/dask/blockwise.py#L674-L684

I checked it's not another kind of dep, even in old dask the delayed-wrapped array is only a constant.

If I add code to correctly deal with Aliases to https://github.com/dask/dask/blob/main/dask/blockwise.py#L674-L684, i.e.:

...
            if isinstance(arg, Alias):
                arg = arg.target.key
...

Then everything works as expected again.

@lgray
Copy link
Collaborator Author

lgray commented Dec 7, 2024

@fjetter is this skipping of Aliases as constant dependencies of blockwise graphs desired or expected?

@lgray
Copy link
Collaborator Author

lgray commented Dec 7, 2024

I also find that if I use TaskRef instead of Alias for our Delayed object everything optimizes and executes correctly without issue.

Understanding of what is correct would be appreciated. For the time being I will change the dask_awkward code to use a TaskRef as opposed to an Alias to see what else breaks in our tests.

@lgray
Copy link
Collaborator Author

lgray commented Dec 7, 2024

OK - only failing tests are over in uproot where we'll have to patch things up to deal with Tasks there as well!

@jpivarski

@lgray
Copy link
Collaborator Author

lgray commented Dec 7, 2024

However, we should settle these correct usage issues and get bugfixes in the right places if they are necessary.

We then wait on further input and guidance from @fjetter as to correct/expected usage.

@lgray
Copy link
Collaborator Author

lgray commented Dec 7, 2024

It's also a bit weird that the GraphNode base class's .ref() function returns an Alias. Hmm.

Judging from that I guess we'd want _cull_dependencies to deal with Aliases in some way, but perhaps the most correct fix is elsewhere.

@lgray
Copy link
Collaborator Author

lgray commented Dec 9, 2024

@fjetter when you have time, we would appreciate your commentary so that we can resolve this.

@lgray
Copy link
Collaborator Author

lgray commented Dec 11, 2024

@fjetter just a ping

@martindurant
Copy link
Collaborator

Thanks for all your effort here, @lgray . I hope @fjetter can OK the code now.

@@ -37,7 +37,8 @@ classifiers = [
]
dependencies = [
"awkward >=2.5.1",
"dask >=2023.04.0",
"dask >=2024.12.0;python_version>'3.9'",
"dask >=2023.04.0;python_version<'3.10'",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In live discussions, we were tending towards dropping backward compatibility here, which means dropping py3.9 support (which dask and numpy already have). Users of py3.9 will not have hit the original problem, since the new dask was not released for them.

This would also save about half the lOC in this PR.

@lgray
Copy link
Collaborator Author

lgray commented Dec 16, 2024

@fjetter are you available to discuss this? Thanks!

@fjetter
Copy link

fjetter commented Dec 16, 2024

TaskRef is the way to go. That's one of the very sharp edges we still have.

dropping backward compatibility here, w

if you wanted to maintain this, I would likely recommend vendoring. The old classes still work. Legacy graphs generally still work. You just got hit by me refactoring Blockwise right away (I looked at your code but missed this, appologies)

Copy link

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@@ -1928,7 +1935,10 @@ def partitionwise_layer(
pairs.extend([arg.name, "i"])
numblocks[arg.name] = (1,)
elif isinstance(arg, Delayed):
pairs.extend([arg.key, None])
if _dask_uses_tasks:
pairs.extend([TaskRef(arg.key), None])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's correct 👍

Comment on lines +245 to +246
new_layer = copy.deepcopy(layer)
task = new_layer.task.copy()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that the task specific copy is not required after the deepcopy. I was already contemplating whether we should get rid of copy (because it is difficult to maintain / would require subclasses to overwrite it and we might want to make use of subclassing)

arg.key if isinstance(arg, GraphNode) else arg
for arg in layer.task.args
]
# how to do this with `.substitute(...)`?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still an open question?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was unsure how do implement this with .substitute(). I used our internal function instead, but it would be nice to use .substitute if that does the same thing.

It's not a show stopper right now though.

@lgray
Copy link
Collaborator Author

lgray commented Dec 16, 2024

@fjetter Just for reference, the source of confusion about Alias vs. TaskRef w.r.t. constant delayed objects comes from here: https://github.com/dask/dask/blob/main/dask/blockwise.py#L388-L404

I realize it's a sharp edge but since you said TaskRef is correct here, it would be good to correct/clarify the docs for posterity.

@lgray
Copy link
Collaborator Author

lgray commented Dec 16, 2024

@martindurant @pfackeldey I would say we merge as-is, and deal with the backwards compat stuff later. We should get uproot and coffea passing again with some priority...

@lgray
Copy link
Collaborator Author

lgray commented Dec 16, 2024

uproot was an easy fix scikit-hep/uproot5#1352

@lgray lgray merged commit d3f3e7c into main Dec 16, 2024
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants