-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix JSON parsing memory corruption - Fix Mixed types nested children removal #15798
Changes from 3 commits
5819de3
b2334cb
1519f39
d98bb83
2bf1216
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2679,4 +2679,40 @@ TEST_F(JsonReaderTest, JsonNestedDtypeFilter) | |
} | ||
} | ||
|
||
TEST_F(JsonReaderTest, JSONMixedTypeChildren) | ||
{ | ||
std::string const json_str = R"( | ||
{ "Root": { "Key": [ { "EE": "A" } ] } } | ||
{ "Root": { "Key": { } } } | ||
{ "Root": { "Key": [{ "YY": 1}] } } | ||
)"; | ||
// Column "EE" is created and destroyed | ||
// Column "YY" should not be created | ||
|
||
cudf::io::json_reader_options options = | ||
cudf::io::json_reader_options::builder(cudf::io::source_info{json_str.c_str(), json_str.size()}) | ||
.lines(true) | ||
.recovery_mode(cudf::io::json_recovery_mode_t::RECOVER_WITH_NULL) | ||
.normalize_single_quotes(true) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: Since we do not have single quotes in the input string, do we need to enable single quotes normalization here? Another minor nit is that we can remove There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, those options are not required. I used the same options as repro code for the bug, but I reduced the input json to minimal. |
||
.normalize_whitespace(false) | ||
.mixed_types_as_string(true) | ||
.keep_quotes(true); | ||
|
||
auto result = cudf::io::read_json(options); | ||
|
||
ASSERT_EQ(result.tbl->num_columns(), 1); | ||
ASSERT_EQ(result.metadata.schema_info.size(), 1); | ||
EXPECT_EQ(result.metadata.schema_info[0].name, "Root"); | ||
ASSERT_EQ(result.metadata.schema_info[0].children.size(), 1); | ||
EXPECT_EQ(result.metadata.schema_info[0].children[0].name, "Key"); | ||
ASSERT_EQ(result.metadata.schema_info[0].children[0].children.size(), 2); | ||
EXPECT_EQ(result.metadata.schema_info[0].children[0].children[0].name, "offsets"); | ||
// types | ||
EXPECT_EQ(result.tbl->get_column(0).type().id(), cudf::type_id::STRUCT); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question: Can type checks also be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense, thank you for the explanation! |
||
EXPECT_EQ(result.tbl->get_column(0).child(0).type().id(), cudf::type_id::STRING); | ||
cudf::test::strings_column_wrapper expected({R"([ { "EE": "A" } ])", "{ }", R"([{ "YY": 1}])"}); | ||
|
||
CUDF_TEST_EXPECT_COLUMNS_EQUAL(expected, result.tbl->get_column(0).child(0)); | ||
} | ||
|
||
CUDF_TEST_PROGRAM_MAIN() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not understanding this comment.
Destroy references of all child columns after what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my only comment about this PR as well. Happy to approve once this is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'after this step'.