-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test case that shows multi-asset failure #17077
Conversation
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
@jamiedemaria @Ramshackle-Jamathon got to this here #17105 |
|
||
for i, spec in enumerate([upstream_asset, downstream_asset]): | ||
|
||
@multi_asset(specs=[spec], name=f"_an_asset_op_{i}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue in this test was because the multi_assets were given the same name. The assets are auto-assigned aliases, but the underlying nodes still have the same name. we also maintain a mapping of node name to node definition, but since both assets have the same underlying node name, only one NodeDefinition is kept around (and it's downstream asset, i think because it was created last).
This causes collisions when we validate dependencies because when we validate the deps for _an_asset_op
(made from the downstream_asset
spec) we want to get the outputs for _an_asset_op
(made from the upstream_asset
spec) the mapping of node name to NodeDefinition gives us the node for _an_asset_op
(made from the downstream_asset
spec), which doesnt have an upstream_asset
output
you get exactly the same issue if you do this:
@multi_asset(
outs={"x": AssetOut(), "y": AssetOut()}
)
def a():
return 1, 2
@multi_asset(
outs={"i": AssetOut(), "j": AssetOut()}
)
def a(): # redefined a!
return 3, 4
@asset(
ins={"x": AssetIn(), "y": AssetIn()}
)
def add(x, y):
return x + y
defs = Definitions(assets=[a, add])
and I think this is clearly a case we can't and shouldn't try to support, which to me means we shouldn't try to support the case when the nodes are made in a loop
I don't think we can solve this with a backend change because of the op alias case. If we have this
from dagster import op, job
@op
def generic_adder(x):
return x + 1
@op
def one():
return 1
@op
def two():
return 2
@job
def do_math():
generic_adder(one())
generic_adder(two())
We want both generic_adder
s to be assigned aliases, but still point to the same underlying node
## Summary & Motivation Adds an External Assets concept page (motivation described in #16754). This also contains a code change necessary because of the bug demonstrated in #17077. ## How I Tested These Changes BK. Also loaded examples in `dagster dev` --------- Co-authored-by: Erin Cochran <erin.k.cochran@gmail.com> Co-authored-by: Yuhan Luo <4531914+yuhan@users.noreply.github.com>
Adds an External Assets concept page (motivation described in #16754). This also contains a code change necessary because of the bug demonstrated in #17077. BK. Also loaded examples in `dagster dev` --------- Co-authored-by: Erin Cochran <erin.k.cochran@gmail.com> Co-authored-by: Yuhan Luo <4531914+yuhan@users.noreply.github.com>
Summary & Motivation
Checking in a failing test case that I discovered writing a factory for external assets.
How I Tested These Changes
BK