fix: adapt to new Task spec in dask, now used in blockwise #556

lgray · 2024-12-04T15:12:31Z

Adapts one bit of dask_awkward code that makes a graph to use a task object instead.

martindurant · 2024-12-04T15:28:08Z

I asked on the dask PR whether there are specific migration instructions associated with their change.

lgray · 2024-12-04T15:41:10Z

@martindurant rewrite_layer_chains needs to be adjusted, along with various checking that's happening during the tests. The latter is all localized in one function for the most part so hopefully a reasonable fix.

There must be some way to just pop out a good old dictionary from the Task class.

lgray · 2024-12-04T15:42:18Z

We will likely need to put a dask >= 2024.12.0 requirement on the next release, and a dask < 2024.12.0 for python < 3.10.

pfackeldey · 2024-12-04T15:50:18Z

If I'm understanding this correctly a Task allows to .substitute dependencies? This could be useful for graph cloning and then replacing the IO layer - or am I misunderstanding this?

for more information, see https://pre-commit.ci

martindurant · 2024-12-04T15:51:54Z

Task allows to .substitute itself

Some docs on that method would be very useful.

for more information, see https://pre-commit.ci

martindurant · 2024-12-04T16:00:53Z

appease mypy's insatiable lust for perfect correctness

:)

for more information, see https://pre-commit.ci

martindurant · 2024-12-04T16:30:41Z

rewrite_layer_chains needs to be adjusted, along with various checking that's happening during the tests

Is this something I'll need to get on?

lgray · 2024-12-04T16:33:31Z

@martindurant seems like it - but it also looks like rewrite_layer_chains get significantly more easy to read using the new fuse/substitute interfaces in Task.

martindurant · 2024-12-07T19:24:35Z

if I skip the cull step everything computes as expected

So the graph is right, the dependency must be there, else it wouldn't compute - but cull is making some other assumption that we don't meet.

lgray · 2024-12-07T19:29:25Z

Indeed, something has definitely changed:

Even though the blockwise_optimized layers claim the Delayed as a dependency (!!) cull drops the Delayed.

lgray · 2024-12-07T20:03:56Z

I have found the source of the difference.

After dask/dask#11568 the delayed array no longer shows up as a constant dependency in the task graph coming from this loop:
https://github.com/dask/dask/blob/main/dask/blockwise.py#L674-L684

I checked it's not another kind of dep, even in old dask the delayed-wrapped array is only a constant.

If I add code to correctly deal with Aliases to https://github.com/dask/dask/blob/main/dask/blockwise.py#L674-L684, i.e.:

...
            if isinstance(arg, Alias):
                arg = arg.target.key
...

Then everything works as expected again.

lgray · 2024-12-07T20:04:45Z

@fjetter is this skipping of Aliases as constant dependencies of blockwise graphs desired or expected?

lgray · 2024-12-07T20:14:42Z

I also find that if I use TaskRef instead of Alias for our Delayed object everything optimizes and executes correctly without issue.

Understanding of what is correct would be appreciated. For the time being I will change the dask_awkward code to use a TaskRef as opposed to an Alias to see what else breaks in our tests.

for more information, see https://pre-commit.ci

lgray · 2024-12-07T20:34:14Z

OK - only failing tests are over in uproot where we'll have to patch things up to deal with Tasks there as well!

@jpivarski

lgray · 2024-12-07T20:36:26Z

However, we should settle these correct usage issues and get bugfixes in the right places if they are necessary.

We then wait on further input and guidance from @fjetter as to correct/expected usage.

lgray · 2024-12-07T20:57:27Z

It's also a bit weird that the GraphNode base class's .ref() function returns an Alias. Hmm.

Judging from that I guess we'd want _cull_dependencies to deal with Aliases in some way, but perhaps the most correct fix is elsewhere.

lgray · 2024-12-09T17:44:55Z

@fjetter when you have time, we would appreciate your commentary so that we can resolve this.

lgray · 2024-12-11T14:15:58Z

@fjetter just a ping

martindurant · 2024-12-12T14:25:37Z

Thanks for all your effort here, @lgray . I hope @fjetter can OK the code now.

martindurant · 2024-12-12T14:28:20Z

pyproject.toml

@@ -37,7 +37,8 @@ classifiers = [
 ]
 dependencies = [
  "awkward >=2.5.1",
-  "dask >=2023.04.0",
+  "dask >=2024.12.0;python_version>'3.9'",
+  "dask >=2023.04.0;python_version<'3.10'",


In live discussions, we were tending towards dropping backward compatibility here, which means dropping py3.9 support (which dask and numpy already have). Users of py3.9 will not have hit the original problem, since the new dask was not released for them.

This would also save about half the lOC in this PR.

lgray · 2024-12-16T15:44:39Z

@fjetter are you available to discuss this? Thanks!

fjetter · 2024-12-16T15:57:12Z

TaskRef is the way to go. That's one of the very sharp edges we still have.

dropping backward compatibility here, w

if you wanted to maintain this, I would likely recommend vendoring. The old classes still work. Legacy graphs generally still work. You just got hit by me refactoring Blockwise right away (I looked at your code but missed this, appologies)

fjetter

Looks good

fjetter · 2024-12-16T16:00:34Z

src/dask_awkward/lib/core.py

@@ -1928,7 +1935,10 @@ def partitionwise_layer(
            pairs.extend([arg.name, "i"])
            numblocks[arg.name] = (1,)
        elif isinstance(arg, Delayed):
-            pairs.extend([arg.key, None])
+            if _dask_uses_tasks:
+                pairs.extend([TaskRef(arg.key), None])


yes, that's correct 👍

fjetter · 2024-12-16T16:40:36Z

src/dask_awkward/lib/optimize.py

+        new_layer = copy.deepcopy(layer)
+        task = new_layer.task.copy()


My guess is that the task specific copy is not required after the deepcopy. I was already contemplating whether we should get rid of copy (because it is difficult to maintain / would require subclasses to overwrite it and we might want to make use of subclassing)

fjetter · 2024-12-16T16:41:06Z

src/dask_awkward/lib/optimize.py

+                    arg.key if isinstance(arg, GraphNode) else arg
+                    for arg in layer.task.args
+                ]
+                # how to do this with `.substitute(...)`?


is this still an open question?

Yes, I was unsure how do implement this with .substitute(). I used our internal function instead, but it would be nice to use .substitute if that does the same thing.

It's not a show stopper right now though.

lgray · 2024-12-16T20:08:48Z

@fjetter Just for reference, the source of confusion about Alias vs. TaskRef w.r.t. constant delayed objects comes from here: https://github.com/dask/dask/blob/main/dask/blockwise.py#L388-L404

I realize it's a sharp edge but since you said TaskRef is correct here, it would be good to correct/clarify the docs for posterity.

lgray · 2024-12-16T20:12:52Z

@martindurant @pfackeldey I would say we merge as-is, and deal with the backwards compat stuff later. We should get uproot and coffea passing again with some priority...

lgray · 2024-12-16T21:21:42Z

uproot was an easy fix scikit-hep/uproot5#1352

lgray added 4 commits December 4, 2024 09:11

fix: adapt to new Task spec in dask, now used in blockwise

6158208

drop py3.8 from tests

d8cc3e4

ah, we need version check logic instead, great...

7312b31

whitespace

f3461bb

guard against missing _task_spec and Task classes in older dask

e100c45

martindurant mentioned this pull request Dec 4, 2024

Dask release 2024.12.0 changed dsk keyword in Blockwise class this breaks AwkwardBlockwiseLayer __init__ #557

Open

lgray and others added 4 commits December 4, 2024 09:51

adjust min version requirements

fc46473

[pre-commit.ci] auto fixes from pre-commit.com hooks

341500a

for more information, see https://pre-commit.ci

commas are good things

8780dae

[pre-commit.ci] auto fixes from pre-commit.com hooks

6f96f4f

for more information, see https://pre-commit.ci

lgray and others added 2 commits December 4, 2024 09:53

would you kindly...

d7b3d9f

[pre-commit.ci] auto fixes from pre-commit.com hooks

883b23e

for more information, see https://pre-commit.ci

jpwgnr mentioned this pull request Dec 4, 2024

dask-awkward issue leads to not working uproot.dask scikit-hep/uproot5#1346

Open

appease mypy's insatiable lust for perfect correctness

c25e1f6

lgray and others added 8 commits December 4, 2024 10:10

cleaner way of dealing with it

7c2174c

[pre-commit.ci] auto fixes from pre-commit.com hooks

2e6da4e

for more information, see https://pre-commit.ci

forgot to update names

06a475c

mypy

ef10929

missing types

954f6e6

mypy...

78d6503

... mypy

3a88ec6

just ignore it all

01b0a45

lgray and others added 6 commits December 7, 2024 14:17

use TaskRef to pass test - may not be correct

bdb42e7

[pre-commit.ci] auto fixes from pre-commit.com hooks

91e4890

for more information, see https://pre-commit.ci

import _dask_uses_tasks

6420dcc

[pre-commit.ci] auto fixes from pre-commit.com hooks

cee57c8

for more information, see https://pre-commit.ci

don't import in the function call

57a2472

better check for dask._task_spec

6a19642

avoid imports in loops

5d01fef

nsmith- mentioned this pull request Dec 9, 2024

Migrate from defaultdict to Counter cms-nanoAOD/correctionlib#267

Merged

ikrommyd mentioned this pull request Dec 12, 2024

Incompatibility with dask 2024.12.0 scikit-hep/coffea#1230

Closed

martindurant reviewed Dec 12, 2024

View reviewed changes

fjetter approved these changes Dec 16, 2024

View reviewed changes

lgray mentioned this pull request Dec 16, 2024

fix: uproot was exposed in one place to dask's _task_spec overhaul scikit-hep/uproot5#1352

Merged

lgray mentioned this pull request Dec 16, 2024

Drop backwards compatibility for old-style dask tasks #560

Open

lgray merged commit d3f3e7c into main Dec 16, 2024
24 of 25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: adapt to new Task spec in dask, now used in blockwise #556

fix: adapt to new Task spec in dask, now used in blockwise #556

lgray commented Dec 4, 2024

martindurant commented Dec 4, 2024

lgray commented Dec 4, 2024

lgray commented Dec 4, 2024

pfackeldey commented Dec 4, 2024 •

edited

Loading

martindurant commented Dec 4, 2024

martindurant commented Dec 4, 2024

martindurant commented Dec 4, 2024

lgray commented Dec 4, 2024 •

edited

Loading

martindurant commented Dec 7, 2024

lgray commented Dec 7, 2024

lgray commented Dec 7, 2024

lgray commented Dec 7, 2024 •

edited

Loading

lgray commented Dec 7, 2024 •

edited

Loading

lgray commented Dec 7, 2024

lgray commented Dec 7, 2024

lgray commented Dec 7, 2024 •

edited

Loading

lgray commented Dec 9, 2024

lgray commented Dec 11, 2024

martindurant commented Dec 12, 2024

martindurant Dec 12, 2024

lgray commented Dec 16, 2024

fjetter commented Dec 16, 2024

fjetter left a comment

fjetter Dec 16, 2024

fjetter Dec 16, 2024

fjetter Dec 16, 2024

pfackeldey Dec 16, 2024

lgray commented Dec 16, 2024 •

edited

Loading

lgray commented Dec 16, 2024

lgray commented Dec 16, 2024

		new_layer = copy.deepcopy(layer)
		task = new_layer.task.copy()

fix: adapt to new Task spec in dask, now used in blockwise #556

fix: adapt to new Task spec in dask, now used in blockwise #556

Conversation

lgray commented Dec 4, 2024

martindurant commented Dec 4, 2024

lgray commented Dec 4, 2024

lgray commented Dec 4, 2024

pfackeldey commented Dec 4, 2024 • edited Loading

martindurant commented Dec 4, 2024

martindurant commented Dec 4, 2024

martindurant commented Dec 4, 2024

lgray commented Dec 4, 2024 • edited Loading

martindurant commented Dec 7, 2024

lgray commented Dec 7, 2024

lgray commented Dec 7, 2024

lgray commented Dec 7, 2024 • edited Loading

lgray commented Dec 7, 2024 • edited Loading

lgray commented Dec 7, 2024

lgray commented Dec 7, 2024

lgray commented Dec 7, 2024 • edited Loading

lgray commented Dec 9, 2024

lgray commented Dec 11, 2024

martindurant commented Dec 12, 2024

martindurant Dec 12, 2024

Choose a reason for hiding this comment

lgray commented Dec 16, 2024

fjetter commented Dec 16, 2024

fjetter left a comment

Choose a reason for hiding this comment

fjetter Dec 16, 2024

Choose a reason for hiding this comment

fjetter Dec 16, 2024

Choose a reason for hiding this comment

fjetter Dec 16, 2024

Choose a reason for hiding this comment

pfackeldey Dec 16, 2024

Choose a reason for hiding this comment

lgray commented Dec 16, 2024 • edited Loading

lgray commented Dec 16, 2024

lgray commented Dec 16, 2024

pfackeldey commented Dec 4, 2024 •

edited

Loading

lgray commented Dec 4, 2024 •

edited

Loading

lgray commented Dec 7, 2024 •

edited

Loading

lgray commented Dec 7, 2024 •

edited

Loading

lgray commented Dec 7, 2024 •

edited

Loading

lgray commented Dec 16, 2024 •

edited

Loading