-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce struct nullability in JSON raw reader (#3900) (#3904) #3906
Enforce struct nullability in JSON raw reader (#3900) (#3904) #3906
Conversation
@@ -514,8 +514,8 @@ mod tests { | |||
Field::new( | |||
"nested", | |||
DataType::Struct(vec![ | |||
Field::new("a", DataType::Int32, false), | |||
Field::new("b", DataType::Int32, false), | |||
Field::new("a", DataType::Int32, true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was previously wrong 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because
{"list": null, "nested": {"a": null}}
has a null
for nested:a
, right? So clearly the field needs to be nullable
@@ -594,7 +594,7 @@ mod tests { | |||
"list2", | |||
DataType::List(Box::new(Field::new( | |||
"element", | |||
DataType::Struct(vec![Field::new("d", DataType::Int32, false)]), | |||
DataType::Struct(vec![Field::new("d", DataType::Int32, true)]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As was this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the issue that in
{"list": [], "nested": {"a": 1, "b": 2}, "nested_list": {"list2": [{"c": 3, "d": 5}, {"c": 4}]}}
The second list element, {"c": 4}
, has an implict null in d
?
Labelling this api-change as there is potential for code that used to "work", albeit incorrectly, to now result in an error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a few test suggestions, but otherwise looks good to me. Thank you @tustvold
null_chunks.remainder_bits(), | ||
value_chunks.remainder_bits(), | ||
))) | ||
.zip(value_chunks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drive by consolidation it look like 👍
@@ -514,8 +514,8 @@ mod tests { | |||
Field::new( | |||
"nested", | |||
DataType::Struct(vec![ | |||
Field::new("a", DataType::Int32, false), | |||
Field::new("b", DataType::Int32, false), | |||
Field::new("a", DataType::Int32, true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because
{"list": null, "nested": {"a": null}}
has a null
for nested:a
, right? So clearly the field needs to be nullable
@@ -594,7 +594,7 @@ mod tests { | |||
"list2", | |||
DataType::List(Box::new(Field::new( | |||
"element", | |||
DataType::Struct(vec![Field::new("d", DataType::Int32, false)]), | |||
DataType::Struct(vec![Field::new("d", DataType::Int32, true)]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the issue that in
{"list": [], "nested": {"a": 1, "b": 2}, "nested_list": {"list2": [{"c": 3, "d": 5}, {"c": 4}]}}
The second list element, {"c": 4}
, has an implict null in d
?
])); | ||
|
||
let batches = do_read(json, 1024, true, schema); | ||
assert_eq!(batches.len(), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if asserting the contents with assert_batches_eq
or similar would be helpful here
Also would it help to verify that the key is not null is enforced. Like that this document errors{ add: { partition_values: [ foo: bar, null: baz ] } }
?
.unwrap(); | ||
assert!(reader.next().unwrap().is_err()); // Should error as not nullable | ||
|
||
// Test nulls in nullable parent can mask nulls in non-nullable child |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also verify that this case errors too?
let null = r#"{"foo": {bar: null}}"#;
…ty-raw-struct-reader
Which issue does this PR close?
Closes #3900
Closes #3904
Rationale for this change
See tickets
What changes are included in this PR?
Are there any user-facing changes?