-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: dagstermill io use io manager for output notebook #4490
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/elementl/dagster/7wzLG346E5vpZVEziX5GmpDZovf5 |
c11f2e9
to
a37577c
Compare
I know we talked about this in the past, but refresh my memory - what's a situation where it's helpful for the notebook to be an output? Do we know of users that use the notebook in downstream steps? What do they do with it? |
I'm not 100% following why we need to make this change. What's the downside of preserving existing behavior here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the existing code, you will need to specify an "output_notebook_io_manager" resource, we provide backcompact_output_notebook_io_manager which depends on file manager for easier migration:
I'm not 100% following why we need to make this change. What's the downside of preserving existing behavior here?
It's an easy path for us that we don't have to maintain the code path like [1] and [2]. But you're right this is not necessary - will explore preserving existing behavior!
a37577c
to
5756cd5
Compare
5756cd5
to
b0980ba
Compare
b6b2cf8
to
ad80495
Compare
fe82551
to
36c4101
Compare
1c5d3d7
to
e618826
Compare
cdee3ed
to
59bdfbc
Compare
@@ -355,14 +355,14 @@ def filepicklelist_resource(init_context): | |||
resource_defs={ | |||
"list": ResourceDefinition(lambda _: []), | |||
"io_manager": fs_io_manager, | |||
"file_manager": local_file_manager, | |||
"output_notebook_io_manager": local_output_notebook_io_manager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could fs_io_manager
be used here for the output_notebook_io_manager
as well? If so, would it make sense for the default IO manager key for the notebook output to just be "io_manager"?
I don't have an opinion here either way, just exploring the options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no because 1) fs_io_manager
pickles the content so users can't open the persisted file right away 2) fs_io_manager
doesn't come with the get_output_asset_key
so it won't yield asset materialization or provide handy asset key customization
OutputNotebookIOManager
base class kinda shield the asset key handling from notebook users - when they customize their own io manager by extending the class, they don't have to worry about "how do i get the notebook show up in asset catalog."
nice! was just looking for something like this |
Summary
depends on #4500
prior context: https://dagster.phacility.com/D6145
User-facing changes:
(1) Deprecating
output_notebook
in favor ofoutput_notebook_name
.output_notebook
would require "file_manager" and result in a FileHandle output - no breaking change to existing users.output_notebook_name
would result in a bytes output and require "output_notebook_io_manager", see details in (2)(2) With the new param, it requires a dedicated IO manager for output notebook. When
output_notebook
oroutput_notebook_name
is specified, "output_notebook_io_manager" is required as a resource.local_output_notebook_io_manager
for handling local output notebook materialization.OutputNotebookIOManager
. This enables use cases:get_output_asset_key
.handle_output
andload_input
, e.g. they can control the file name of the output notebook.handle_output
methodthis is also fixing #3380
Test Plan
bk
fan_in_notebook_pipeline_legacy
uses output_notebook:fan_in_notebook_pipeline
uses output_notebook_name:Migration guide
old (deprecated, won't break):
new: