-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for dask-awkward arrays in vector constructors #429
Conversation
8f947b6
to
6c5de35
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #429 +/- ##
==========================================
- Coverage 82.92% 82.81% -0.11%
==========================================
Files 96 96
Lines 11533 11559 +26
==========================================
+ Hits 9564 9573 +9
- Misses 1969 1986 +17 ☔ View full report in Codecov by Sentry. |
needs_behavior = not vector._awkward_registered | ||
for x in arrays: | ||
if needs_behavior: | ||
if x.behavior is None: | ||
x.behavior = vector.backends.awkward.behavior | ||
else: | ||
x.behavior = dict(x.behavior) | ||
x.behavior.update(vector.backends.awkward.behavior) | ||
else: | ||
x.behavior = None | ||
needs_behavior = False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this piece of code is still required. If it is, is there a way to set behavior
for dask_awkward
arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code block looks important. We don't want to add a giant behavior
dict to every array if the relevant behaviors have already been installed globally.
All of the ak.*
functions can be used on dask-awkward arrays1, so the code block that you have below, which builds the vector by calling ak.with_name
passing the behavior
gets it into the new dask-awkward array.
Footnotes
-
Any
ak.*
function that takesak.Arrays
as arguments and returns anak.Array
can also be a function that takesdak.Arrays
as arguments and returns adak.Array
. Awkward now has an overload mechanism like NumPy's. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Given that behavior
of a dask_awkward.Array
has no setter, should I use conditional statements here to preserve the code block (subsequently using if
while passing behavior
in awkward.with_name
) or is there a better way to do this?
if vector._is_awkward_v2 and not isinstance(args[0], dask_awkward.Array):
needs_behavior = not vector._awkward_registered
for x in arrays:
if needs_behavior:
if x.behavior is None:
x.behavior = vector.backends.awkward.behavior
else:
x.behavior = dict(x.behavior)
x.behavior.update(vector.backends.awkward.behavior)
else:
x.behavior = None
needs_behavior = False
assert 2 <= dimension <= 4, f"Dimension must be between 2-4, not {dimension}"
return awkward.with_name(
awkward.zip(
dict(__builtins__["zip"](names, arrays)), # type: ignore[index]
depth_limit=akarray.layout.purelist_depth,
),
_recname(is_momentum, dimension),
behavior=vector.backends.awkward.behavior
if vector._is_awkward_v2 and isinstance(args[0], dask_awkward.Array))
else None,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good, but I think you need the block that checks to see if the Vector behaviors have been installed globally before deciding whether to attach them all to the output array.
needs_behavior = not vector._awkward_registered | ||
for x in arrays: | ||
if needs_behavior: | ||
if x.behavior is None: | ||
x.behavior = vector.backends.awkward.behavior | ||
else: | ||
x.behavior = dict(x.behavior) | ||
x.behavior.update(vector.backends.awkward.behavior) | ||
else: | ||
x.behavior = None | ||
needs_behavior = False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code block looks important. We don't want to add a giant behavior
dict to every array if the relevant behaviors have already been installed globally.
All of the ak.*
functions can be used on dask-awkward arrays1, so the code block that you have below, which builds the vector by calling ak.with_name
passing the behavior
gets it into the new dask-awkward array.
Footnotes
-
Any
ak.*
function that takesak.Arrays
as arguments and returns anak.Array
can also be a function that takesdak.Arrays
as arguments and returns adak.Array
. Awkward now has an overload mechanism like NumPy's. ↩
ccf48d3
to
8c00d17
Compare
f50e2da
to
4c9b5e8
Compare
@jpivarski, could you please take a quick look at this again? I'll merge this today. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation is right: you just use ak.with_name
and it does the right thing for dask_awkward.Array
s.
I'm a little unsure of the typing, and this goal of having mypy be able to type-check the library and user code might be competing with other goals of simplicity and not getting tangled in dependencies that may be unnecessary at runtime.
Both goals should be attainable. I'd like someone with more Python typing experience to check it, maybe @henryiii? (If you're not too busy?)
ak_array_types: tuple[typing.Any, ...] = (awkward.Array,) | ||
|
||
if vector._is_awkward_v2: | ||
import dask_awkward # type: ignore[import-not-found] | ||
|
||
ak_array_types += (dask_awkward.Array,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of why I think we should drop Awkward 1 support soon. Generally, the typing code should be declarative.
By the way, I think you can declare a dask_awkward.Array
dependency without importing dask_awkward
by using a string: "dask_awkward.Array"
. I'm not 100% sure about typing syntax; maybe this only works past a particular Python version. However, we want to be able to allow some types without necessarily importing them (a start-up time cost) or even having them installed (a dependency-complexity cost).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yes, that makes sense. I could remove awkward v1 support in this PR itself, but it might be better to do that in the v1.3.1
/v1.4.0
release. PEP 484 mentions forward referencing to deal with such cases, but that will only be helpful while adding type hints.
I have pushed a commit rewriting the logic for handling dask_awkward
arrays. The implementation does not require importing dask_awkward
. Please let me know if that works!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping a major dependency version should happen at some significant version threshold in Vector. Maybe 1.3.x → 1.4.0, or maybe 1.x → 2.0.0.
I like the idea of 2.0.0 (Uproot increases major version when Awkward increases major version), but this has scared users in the past: seeing Uproot 4 → 5 made some people think that the changes in Uproot were major, rather than the changes in its dependencies.
So maybe 1.4.0, yeah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll drop awkward v1 in the next release.
Description
Fixes #301
Kindly take a look at CONTRIBUTING.md.
Please describe the purpose of this pull request. Reference and link to any relevant issues or pull requests.
Checklist
$ pre-commit run --all-files
or$ nox -s lint
)$ pytest
or$ nox -s tests
)$ cd docs; make clean; make html
or$ nox -s docs
)$ pytest --doctest-plus src/vector/
or$ nox -s doctests
)Before Merging