-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: adapt to new Task spec in dask, now used in blockwise #556
Conversation
I asked on the dask PR whether there are specific migration instructions associated with their change. |
@martindurant There must be some way to just pop out a good old dictionary from the Task class. |
We will likely need to put a |
If I'm understanding this correctly a |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Some docs on that method would be very useful. |
for more information, see https://pre-commit.ci
:) |
Is this something I'll need to get on? |
@martindurant seems like it - but it also looks like |
So the graph is right, the dependency must be there, else it wouldn't compute - but cull is making some other assumption that we don't meet. |
I have found the source of the difference. After dask/dask#11568 the delayed array no longer shows up as a constant dependency in the task graph coming from this loop: I checked it's not another kind of dep, even in old dask the delayed-wrapped array is only a constant. If I add code to correctly deal with ...
if isinstance(arg, Alias):
arg = arg.target.key
... Then everything works as expected again. |
@fjetter is this skipping of |
I also find that if I use Understanding of what is correct would be appreciated. For the time being I will change the dask_awkward code to use a |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
OK - only failing tests are over in uproot where we'll have to patch things up to deal with Tasks there as well! |
However, we should settle these correct usage issues and get bugfixes in the right places if they are necessary. We then wait on further input and guidance from @fjetter as to correct/expected usage. |
It's also a bit weird that the Judging from that I guess we'd want |
@fjetter when you have time, we would appreciate your commentary so that we can resolve this. |
@fjetter just a ping |
@@ -37,7 +37,8 @@ classifiers = [ | |||
] | |||
dependencies = [ | |||
"awkward >=2.5.1", | |||
"dask >=2023.04.0", | |||
"dask >=2024.12.0;python_version>'3.9'", | |||
"dask >=2023.04.0;python_version<'3.10'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In live discussions, we were tending towards dropping backward compatibility here, which means dropping py3.9 support (which dask and numpy already have). Users of py3.9 will not have hit the original problem, since the new dask was not released for them.
This would also save about half the lOC in this PR.
@fjetter are you available to discuss this? Thanks! |
if you wanted to maintain this, I would likely recommend vendoring. The old classes still work. Legacy graphs generally still work. You just got hit by me refactoring Blockwise right away (I looked at your code but missed this, appologies) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
@@ -1928,7 +1935,10 @@ def partitionwise_layer( | |||
pairs.extend([arg.name, "i"]) | |||
numblocks[arg.name] = (1,) | |||
elif isinstance(arg, Delayed): | |||
pairs.extend([arg.key, None]) | |||
if _dask_uses_tasks: | |||
pairs.extend([TaskRef(arg.key), None]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's correct 👍
new_layer = copy.deepcopy(layer) | ||
task = new_layer.task.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that the task specific copy is not required after the deepcopy. I was already contemplating whether we should get rid of copy
(because it is difficult to maintain / would require subclasses to overwrite it and we might want to make use of subclassing)
arg.key if isinstance(arg, GraphNode) else arg | ||
for arg in layer.task.args | ||
] | ||
# how to do this with `.substitute(...)`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still an open question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was unsure how do implement this with .substitute()
. I used our internal function instead, but it would be nice to use .substitute
if that does the same thing.
It's not a show stopper right now though.
@fjetter Just for reference, the source of confusion about I realize it's a sharp edge but since you said TaskRef is correct here, it would be good to correct/clarify the docs for posterity. |
@martindurant @pfackeldey I would say we merge as-is, and deal with the backwards compat stuff later. We should get uproot and coffea passing again with some priority... |
uproot was an easy fix scikit-hep/uproot5#1352 |
Adapts one bit of dask_awkward code that makes a graph to use a task object instead.
c.f.:
dask/dask#11568