You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
If we try to use the JSON projection capabilities using an index in ascending order everything works great, but if we do it in a descending or mixed order the return batch has the column name correct but the data is in a incorrect order
To Reproduce
Modifying the sample file 1.json to simple strings
#[tokio::test]
async fn nd_json_exec_file_projection() -> Result<()> {
let session_ctx = SessionContext::new();
let task_ctx = session_ctx.task_ctx();
let (object_store_url, file_groups, file_schema) =
prepare_store(&session_ctx, FileCompressionType::UNCOMPRESSED).await;
let exec = NdJsonExec::new(
FileScanConfig {
object_store_url,
file_groups,
file_schema,
statistics: Statistics::default(),
projection: Some(vec![2, 0]), // inverted order from the original test
limit: None,
table_partition_cols: vec![],
config_options: ConfigOptions::new().into_shareable(),
output_ordering: None,
},
FileCompressionType::UNCOMPRESSED,
);
let inferred_schema = exec.schema();
assert_eq!(inferred_schema.fields().len(), 2);
inferred_schema.field_with_name("a").unwrap();
inferred_schema.field_with_name("b").unwrap_err();
inferred_schema.field_with_name("c").unwrap();
inferred_schema.field_with_name("d").unwrap_err();
assert_eq!(inferred_schema.index_of("c").unwrap(), 0); // schema maintains the projection order
assert_eq!(inferred_schema.index_of("a").unwrap(), 1);
let mut it = exec.execute(0, task_ctx)?;
let batch = it.next().await.unwrap()?;
assert_eq!(batch.schema().index_of("c").unwrap(), 0); // batch schema maintains the projection order
assert_eq!(batch.schema().index_of("a").unwrap(), 1);
assert_eq!(batch.num_rows(), 4);
let mut values = batch
.column(0)
.as_any()
.downcast_ref::<arrow::array::StringArray>()
.unwrap();
assert_eq!(values.value(1), "c2"); // error: value has a2 when the column is c
values = batch
.column(1)
.as_any()
.downcast_ref::<arrow::array::StringArray>()
.unwrap();
assert_eq!(values.value(0), "a1"); // error: value has c1 when the column is a
Ok(())
}
Expected behaviour
Expected to behave like other implementation (parquet or CSV) where the order of the projection doesn't matter
The text was updated successfully, but these errors were encountered:
tiago-ssantos
changed the title
Datafusion using JSON projection only work when the index is in ascending order
JSON projection only work when the index is in ascending order
Jan 6, 2023
@tiago-ssantos, thank you for reporting this issue! I cant find a release / point in time when JSON scan with randomly ordered projection executed correctly, but it was fixed in 18.0.0 (or 18.0.0-rc1) by #5056, I believe.
Describe the bug
If we try to use the JSON projection capabilities using an index in ascending order everything works great, but if we do it in a descending or mixed order the return batch has the column name correct but the data is in a incorrect order
To Reproduce
Modifying the sample file 1.json to simple strings
and using as base the test present in json.rs
Expected behaviour
Expected to behave like other implementation (parquet or CSV) where the order of the projection doesn't matter
The text was updated successfully, but these errors were encountered: