Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-4466: [Rust] [DataFusion] Add support for Parquet data source #3851

Closed
wants to merge 38 commits into from
Closed
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
10710a2
Parquet datasource
andygrove Mar 9, 2019
ff3e5b7
test
andygrove Mar 9, 2019
3a412b1
first parquet test passes
andygrove Mar 9, 2019
322fc87
add test for reading strings from parquet
andygrove Mar 9, 2019
eaddafb
save
andygrove Mar 9, 2019
f46e6f7
save
andygrove Mar 9, 2019
aea9f8a
convert to use row iter
andygrove Mar 9, 2019
c3f71d7
add integration test
andygrove Mar 9, 2019
5ce3086
revert to columnar reads
andygrove Mar 10, 2019
b4981ed
implement more parquet column types and tests
andygrove Mar 10, 2019
6c3b7e2
add support for all primitive parquet types
andygrove Mar 10, 2019
debb2fb
code cleanup
andygrove Mar 10, 2019
157512e
Remove invalid TODO comment
andygrove Mar 10, 2019
dddb7d7
update to use partition-aware changes from master
andygrove Mar 10, 2019
7e1a98f
remove println and unwrap
andygrove Mar 10, 2019
c56510e
projection takes slice instead of vec
andygrove Mar 10, 2019
6457c36
use parquet::reader::schema::parquet_to_arrow_schema
andygrove Mar 12, 2019
e8aa784
revert temp debug change to error messages
andygrove Mar 12, 2019
607a29f
return result if there are null values
andygrove Mar 12, 2019
e6cbbaa
replace read_column! macro with generic
nevi-me Mar 13, 2019
3c711a5
immediately allocate vec
nevi-me Mar 13, 2019
306d07a
fmt
nevi-me Mar 13, 2019
5a3368c
Remove unnecessary slice, fix null handling
andygrove Mar 13, 2019
80cf303
add date support
andygrove Mar 13, 2019
1503855
handle nulls for binary data
andygrove Mar 13, 2019
639e13e
null handling for int96
andygrove Mar 13, 2019
9d3047a
code cleanup
andygrove Mar 13, 2019
2aeea24
remove println from tests
andygrove Mar 13, 2019
02b2ed3
fix int96 conversion to read timestamps correctly
nevi-me Mar 14, 2019
023dc25
Merge pull request #2 from nevi-me/ARROW-4466
andygrove Mar 14, 2019
1ec815b
Clean up imports
andygrove Mar 14, 2019
9b1308f
clean up handling of INT96 and DATE/TIME/TIMESTAMP types in schema co…
andygrove Mar 14, 2019
25d34ac
Make INT32/64/96 handling consistent with C++ implementation
andygrove Mar 14, 2019
73aa934
Remove println from test
andygrove Mar 14, 2019
204db83
fix timestamp nano issue
andygrove Mar 14, 2019
8d2df06
move schema projection function from arrow into datafusion
andygrove Mar 15, 2019
549c829
Remove hard-coded batch size, fix nits
andygrove Mar 15, 2019
3158529
add test for reading small batches
andygrove Mar 15, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions rust/datafusion/src/datasource/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
pub mod csv;
pub mod datasource;
pub mod memory;
pub mod parquet;

pub use self::csv::{CsvBatchIterator, CsvFile};
pub use self::datasource::{RecordBatchIterator, ScanResult, Table};
Expand Down
Loading