-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avro Table Provider #910
Merged
Merged
Avro Table Provider #910
Changes from all commits
Commits
Show all changes
56 commits
Select commit
Hold shift + click to select a range
7e14f91
Add avro as a datasource, file and table provider
Igosuki 0f940ab
wip
Igosuki 2ca7ccb
Added support composite identifiers for struct type.
jorgecarleitao fcbf43b
Fixed build.
jorgecarleitao bae2f00
cheat and add unions to valid composite column types
Igosuki a352dad
Implement the AvroArrayReader
Igosuki d42cdd1
Add binary types
Igosuki cbdc9c7
Enable Avro as a FileType
Igosuki 7d6078e
Enable registering an avro table in the sql parsing
Igosuki 44cfa87
Change package name for datafusion/avro
Igosuki 6ef92d8
Implement Avro datasource tests and fix avro_rs::Value resolution to …
Igosuki 01cbb18
Test for AvroExec::try_from_path
Igosuki 537577a
external table avro test
Igosuki 75566d1
Basic schema conversion tests
Igosuki 7fdf233
Complete test for avro_to_arrow_reader on alltypes_dictionnary
Igosuki 697e3a9
fix_stable: .rewind is 'unstable'
Igosuki e1d6df8
Fix license files and remove the unused avro-converter crate
Igosuki 66a5901
fix example test in avro_to_arrow
Igosuki 9ea942c
add avro_sql test to default workflow
Igosuki 84ee28a
Adress clippies
Igosuki 9edac57
Enable avro as a valid datasource for client execution
Igosuki e8a6206
Add avro to available logical plan nodes
Igosuki 45eac7c
Add ToTimestampMillis as a scalar function in protos
Igosuki b4340ac
Allow Avro in PhysicalPlan nodes
Igosuki 408f759
Remove remaining confusing references to 'json' in avro mod
Igosuki f34b995
rename 'parquet' words in avro test and examples
Igosuki 6ce0904
Handle Union of nested lists in arrow reader
Igosuki e1e40ef
test timestamp arrays
Igosuki 1c79ffb
remove debug statement
Igosuki c96b781
Make avro optional
Igosuki 060c644
Remove debug statement
Igosuki 72ad35c
Remove GetField usage (see #628)
Igosuki 3c8f6ce
Fix docstring in parser tests
Igosuki 0db3013
Test batch output rather than just rows individually
Igosuki faa2152
Remove 'csv' from error strings in physical_plan::avro
Igosuki a6cd7f1
Avro sample sql and explain queries tests in sql.rs
Igosuki 8b4fcae
Activate avro feature for cargo tests in github workflow
Igosuki b2b7915
Add a test for avro registering multiple files in a single table
Igosuki 8acf062
Switch to Result instead of Option for resolve_string
Igosuki 65c9b9e
Address missing clippy warning should_implement_trait in arrow_to_avr…
Igosuki 1668b96
Add fmt display implementation for AvroExec
Igosuki c9218e2
ci: fix cargo sql run example, use datafusion/avro feature instead of…
Igosuki be09a64
license: missing license file for avro_to_arrow/schema.rs
Igosuki 68df268
only run avro datasource tests if features have 'avro'
Igosuki 09c4ecb
refactor: rename infer_avro_schema_from_reader to read_avro_schema_fr…
Igosuki 94b9dcb
Pass None as props to avro schema schema_to_field_with_props until fu…
Igosuki a03a3c2
Change schema inferance to FixedSizeBinary(16) for Uuid
Igosuki 405c63a
schema: prefix metadata coming from avro with 'avro'
Igosuki 291637a
make num traits optional and part of the avro feature flag
Igosuki 00f109c
Fix avro schema tests regarding external props
Igosuki ea31e02
split avro physical plan test feature wise and add a non-implemented …
Igosuki a264858
submodule: switch back to apache/arrow-testing
Igosuki c62d931
fix_test: columns are now prefixed in the plan
Igosuki aa45189
avro_test: fix clippy warning cmp-owned
Igosuki 0b746db
avro: move statistics to the physical plan
Igosuki 767c469
Increase min stack size for cargo tests
Igosuki File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,7 @@ use crate::serde::{protobuf, BallistaError}; | |
use datafusion::arrow::datatypes::{ | ||
DataType, Field, IntervalUnit, Schema, SchemaRef, TimeUnit, | ||
}; | ||
use datafusion::datasource::avro::AvroFile; | ||
use datafusion::datasource::{CsvFile, PartitionedFile, TableDescriptor}; | ||
use datafusion::logical_plan::{ | ||
window_frames::{WindowFrame, WindowFrameBound, WindowFrameUnits}, | ||
|
@@ -793,6 +794,19 @@ impl TryInto<protobuf::LogicalPlanNode> for &LogicalPlan { | |
}, | ||
)), | ||
}) | ||
} else if let Some(avro) = source.downcast_ref::<AvroFile>() { | ||
Ok(protobuf::LogicalPlanNode { | ||
logical_plan_type: Some(LogicalPlanType::AvroScan( | ||
protobuf::AvroTableScanNode { | ||
table_name: table_name.to_owned(), | ||
path: avro.path().to_owned(), | ||
projection, | ||
schema: Some(schema), | ||
file_extension: avro.file_extension().to_string(), | ||
filters, | ||
}, | ||
)), | ||
}) | ||
} else { | ||
Err(BallistaError::General(format!( | ||
"logical plan to_proto unsupported table provider {:?}", | ||
|
@@ -974,6 +988,7 @@ impl TryInto<protobuf::LogicalPlanNode> for &LogicalPlan { | |
FileType::NdJson => protobuf::FileType::NdJson, | ||
FileType::Parquet => protobuf::FileType::Parquet, | ||
FileType::CSV => protobuf::FileType::Csv, | ||
FileType::Avro => protobuf::FileType::Avro, | ||
}; | ||
|
||
Ok(protobuf::LogicalPlanNode { | ||
|
@@ -1098,7 +1113,13 @@ impl TryInto<protobuf::LogicalExprNode> for &Expr { | |
) | ||
} | ||
}; | ||
let arg = &args[0]; | ||
let arg_expr: Option<Box<protobuf::LogicalExprNode>> = if !args.is_empty() | ||
{ | ||
let arg = &args[0]; | ||
Some(Box::new(arg.try_into()?)) | ||
} else { | ||
None | ||
}; | ||
let partition_by = partition_by | ||
.iter() | ||
.map(|e| e.try_into()) | ||
|
@@ -1111,7 +1132,7 @@ impl TryInto<protobuf::LogicalExprNode> for &Expr { | |
protobuf::window_expr_node::WindowFrame::Frame(window_frame.into()) | ||
}); | ||
let window_expr = Box::new(protobuf::WindowExprNode { | ||
expr: Some(Box::new(arg.try_into()?)), | ||
expr: arg_expr, | ||
window_function: Some(window_function), | ||
partition_by, | ||
order_by, | ||
|
@@ -1284,7 +1305,7 @@ impl TryInto<protobuf::LogicalExprNode> for &Expr { | |
Expr::Wildcard => Ok(protobuf::LogicalExprNode { | ||
expr_type: Some(protobuf::logical_expr_node::ExprType::Wildcard(true)), | ||
}), | ||
Expr::TryCast { .. } => unimplemented!(), | ||
_ => unimplemented!(), | ||
} | ||
} | ||
} | ||
|
@@ -1473,6 +1494,9 @@ impl TryInto<protobuf::ScalarFunction> for &BuiltinScalarFunction { | |
BuiltinScalarFunction::SHA256 => Ok(protobuf::ScalarFunction::Sha256), | ||
BuiltinScalarFunction::SHA384 => Ok(protobuf::ScalarFunction::Sha384), | ||
BuiltinScalarFunction::SHA512 => Ok(protobuf::ScalarFunction::Sha512), | ||
BuiltinScalarFunction::ToTimestampMillis => { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the addition of |
||
Ok(protobuf::ScalarFunction::Totimestampmillis) | ||
} | ||
_ => Err(BallistaError::General(format!( | ||
"logical_plan::to_proto() unsupported scalar function {:?}", | ||
self | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍