-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Fix StructuredDataset empty-str file_format
in dc attr access
#3027
base: master
Are you sure you want to change the base?
[BUG] Fix StructuredDataset empty-str file_format
in dc attr access
#3027
Conversation
Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>
Code Review Agent Run Status
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3027 +/- ##
==========================================
+ Coverage 79.52% 82.79% +3.26%
==========================================
Files 201 3 -198
Lines 21250 186 -21064
Branches 2729 0 -2729
==========================================
- Hits 16900 154 -16746
+ Misses 3574 32 -3542
+ Partials 776 0 -776 ☔ View full report in Codecov by Sentry. |
Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>
Code Review Agent Run Status
|
Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>
Code Review Agent Run Status
|
Signed-off-by: JiaWei Jiang <waynechuang97@gmail.com>
file_format
in dc attr accessfile_format
in dc attr access
Code Review Agent Run #2ae7a3Actionable Suggestions - 0Review Details
|
Changelist by BitoThis pull request implements the following key changes.
|
Tracking issue
Closes flyteorg/flyte#6096.
Why are the changes needed?
When users create a
StructuredDataset
with a specifiedfile_format
(e.g.,parquet
), thefile_format
information will be accidentally discarded in this case duringasync_to_literal
call. To be concrete,StructuredDatasetType
'sformat
attribute is set toGENERIC_FORMAT
, which is an empty string""
.What changes were proposed in this pull request?
Override
StructuredDatasetType
'sformat
attribute when users explicitly setfile_format
of python nativeStructuredDataset
.How was this patch tested?
This patch is tested through the newly added integration test and double checked by observing the flyte console I/O and the task pod stdout.
Setup process
For local run, the setup process is summarized as follows:
After installation, run the following command:
Screenshots
The following results are expected:
Flyte console input
Flyte console output
Task pod stdout
Check all the applicable boxes
Related PRs
Docs link
Summary by Bito
Fixed a critical bug in StructuredDataset where user-specified file_format information was being lost during async_to_literal conversion. The fix implements format attribute override in StructuredDatasetType to preserve user-defined formats. Changes include transformer engine modifications and integration tests.Unit tests added: True
Estimated effort to review (1-5, lower is better): 2