Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade DataFusion to latest, to include fixes for aggregation #216

Merged
merged 85 commits into from
Nov 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
b16cd93
Cleanup logical optimizer rules. (#7919)
mustafasrepo Oct 25, 2023
148f890
Parallelize Serialization of Columns within Parquet RowGroups (#7655)
devinjdangelo Oct 25, 2023
d190ff1
feat: Use bloom filter when reading parquet to skip row groups (#7821)
hengfeiyang Oct 25, 2023
48ea4b2
fix: don't push down volatile predicates in projection (#7909)
haohuaijin Oct 25, 2023
12a6316
Add `parquet` feature flag, enabled by default, and make parquet cond…
ongchi Oct 25, 2023
128d7c6
[MINOR]: Simplify enforce_distribution, minor changes (#7924)
mustafasrepo Oct 25, 2023
4881b5d
Add simple window query to sqllogictest (#7928)
Jefffrey Oct 25, 2023
0911f15
ci: upgrade node to version 20 (#7918)
crepererum Oct 26, 2023
12b473b
Change input for `to_timestamp` function to be seconds rather than na…
comphead Oct 26, 2023
6e87f59
Minor: Document `parquet` crate feature (#7927)
alamb Oct 26, 2023
a892300
Minor: reduce some #cfg(feature = "parquet") (#7929)
alamb Oct 26, 2023
ae9a446
Minor: reduce use of cfg(parquet) in tests (#7930)
alamb Oct 26, 2023
30e5f42
Fix CI failures on `to_timestamp()` calls (#7941)
comphead Oct 26, 2023
a9d66e2
minor: add a datatype casting for the updated value (#7922)
jonahgao Oct 26, 2023
74fc6f8
fix (#7946)
haohuaijin Oct 27, 2023
46ae9a4
Add simple exclude all columns test to sqllogictest (#7945)
Jefffrey Oct 28, 2023
250e716
Support Partitioning Data by Dictionary Encoded String Array Types (#…
devinjdangelo Oct 28, 2023
d28c79d
Minor: Remove array() in array_expression (#7961)
jayzhan211 Oct 28, 2023
b02fe5b
Minor: simplify update code (#7943)
alamb Oct 28, 2023
9ee055a
Add some initial content about creating logical plans (#7952)
andygrove Oct 28, 2023
d24228a
Minor: Change from `&mut SessionContext` to `&SessionContext` in subs…
my-vegetable-has-exploded Oct 29, 2023
f388a2b
Fix crate READMEs (#7964)
Jefffrey Oct 29, 2023
4a91ce9
Minor: Improve `HashJoinExec` documentation (#7953)
alamb Oct 29, 2023
9b45967
chore: clean useless clone baesd on clippy (#7973)
Weijun-H Oct 29, 2023
806a963
Add README.md to `core`, `execution` and `physical-plan` crates (#7970)
alamb Oct 30, 2023
fdb5454
Move source repartitioning into `ExecutionPlan::repartition` (#7936)
alamb Oct 30, 2023
3fd8a20
minor: fix broken links in README.md (#7986)
jonahgao Oct 30, 2023
7ee2c0b
Minor: Upate the `sqllogictest` crate README (#7971)
alamb Oct 30, 2023
bb1d7f9
Improve MemoryCatalogProvider default impl block placement (#7975)
lewiszlw Oct 30, 2023
448dff5
Fix `ScalarValue` handling of NULL values for ListArray (#7969)
viirya Oct 30, 2023
0d4dc36
Refactor of Ordering and Prunability Traversals and States (#7985)
berkaysynnada Oct 30, 2023
3d78bf4
Keep output as scalar for scalar function if all inputs are scalar (#…
viirya Oct 31, 2023
d8e413c
Fix crate READMEs for core, execution, physical-plan (#7990)
Jefffrey Oct 31, 2023
8fcc5e0
Update sqlparser requirement from 0.38.0 to 0.39.0 (#7983)
jackwener Oct 31, 2023
27e64ae
Fix panic in multiple distinct aggregates by fixing `ScalarValue::new…
alamb Oct 31, 2023
747cb50
MemoryReservation exposes MemoryConsumer (#8000)
milenkovicm Oct 31, 2023
656c6a9
fix: generate logical plan for `UPDATE SET FROM` statement (#7984)
jonahgao Oct 31, 2023
3185783
Create temporary files for reading or writing (#8005)
smallzhongfeng Nov 1, 2023
aef95ed
doc: minor fix to SortExec::with_fetch comment (#8011)
westonpace Nov 1, 2023
69ba82f
Fix: dataframe_subquery example Optimizer rule `common_sub_expression…
smallzhongfeng Nov 1, 2023
4e60cdd
Percent Decode URL Paths (#8009) (#8012)
tustvold Nov 1, 2023
e98625c
Minor: Extract common deps into workspace (#7982)
lewiszlw Nov 1, 2023
7d1cf91
minor: change some plan_err to exec_err (#7996)
waynexia Nov 1, 2023
7788b90
Minor: error on unsupported RESPECT NULLs syntax (#7998)
alamb Nov 1, 2023
94dac76
GroupedHashAggregateStream breaks spill batch (#8004)
milenkovicm Nov 1, 2023
35e8e33
Minor: Add implementation examples to ExecutionPlan::execute (#8013)
tustvold Nov 1, 2023
06f2475
address comment (#7993)
jayzhan211 Nov 1, 2023
5634cce
GroupedHashAggregateStream should register spillable consumer (#8002)
milenkovicm Nov 1, 2023
7f3f465
fix: single_distinct_aggretation_to_group_by fail (#7997)
haohuaijin Nov 2, 2023
436a4fa
Read only enough bytes to infer Arrow IPC file schema via stream (#7962)
Jefffrey Nov 2, 2023
d2671cd
Minor: remove a strange char (#8030)
haohuaijin Nov 2, 2023
b089137
Minor: Improve documentation for Filter Pushdown (#8023)
alamb Nov 2, 2023
8682be5
Minor: Improve `ExecutionPlan` documentation (#8019)
alamb Nov 2, 2023
0fa4ce9
fix: clippy warnings from nightly rust 1.75 (#8025)
waynexia Nov 2, 2023
661d211
Minor: Avoid recomputing compute_array_ndims in align_array_dimension…
jayzhan211 Nov 2, 2023
b2a1668
Minor: fix doc check (#8037)
alamb Nov 3, 2023
8c42d94
Minor: remove uncessary #cfg test (#8036)
alamb Nov 3, 2023
2906a24
Minor: Improve documentation for `PartitionStream` and `StreamingTab…
alamb Nov 3, 2023
c2e7680
Combine Equivalence and Ordering equivalence to simplify state (#8006)
mustafasrepo Nov 3, 2023
41effc4
Encapsulate `ProjectionMapping` as a struct (#8033)
alamb Nov 4, 2023
8acdb07
Minor: Fix bugs in docs for `to_timestamp`, `to_timestamp_seconds`, .…
alamb Nov 4, 2023
2af326a
Improve comments for `PartitionSearchMode` struct (#8047)
ozankabak Nov 4, 2023
3469c4e
General approach for Array replace (#8050)
jayzhan211 Nov 4, 2023
e505cdd
Minor: Remove the irrelevant note from the Expression API doc (#8053)
ongchi Nov 5, 2023
b54990d
Minor: Add more documentation about Partitioning (#8022)
alamb Nov 5, 2023
40a3cd0
Minor: improve documentation for IsNotNull, DISTINCT, etc (#8052)
alamb Nov 6, 2023
e95e3f8
Prepare 33.0.0 Release (#8057)
andygrove Nov 6, 2023
af3ce6b
Minor: improve error message by adding types to message (#8065)
alamb Nov 6, 2023
308c354
Minor: Remove redundant BuiltinScalarFunction::supports_zero_argument…
2010YOUY01 Nov 6, 2023
223a7fb
Add example to ci (#8060)
smallzhongfeng Nov 7, 2023
07c08a3
Update substrait requirement from 0.18.0 to 0.19.0 (#8076)
dependabot[bot] Nov 7, 2023
06fd26b
Fix incorrect results in COUNT(*) queries with LIMIT (#8049)
msirek Nov 7, 2023
56f6437
feat: Support determining extensions from names like `foo.parquet.sna…
Weijun-H Nov 7, 2023
f3c9009
Use FairSpillPool for TaskContext with spillable config (#8072)
viirya Nov 7, 2023
0506a5c
Minor: Improve HashJoinStream docstrings (#8070)
alamb Nov 7, 2023
724bafd
Fixing broken link (#8085)
edmondop Nov 8, 2023
3446382
fix: DataFusion suggests invalid functions (#8083)
jonahgao Nov 8, 2023
aefee03
Replace macro with function for `array_repeat` (#8071)
jayzhan211 Nov 8, 2023
15d8c9b
Minor: remove unnecessary projection in `single_distinct_to_group_by`…
haohuaijin Nov 8, 2023
b7251e4
minor: Remove duplicate version numbers for arrow, object_store, and …
andygrove Nov 8, 2023
21b2af1
fix: add match encode/decode scalar function type (#8089)
Syleechan Nov 8, 2023
965b318
feat: Protobuf serde for Json file sink (#8062)
Jefffrey Nov 8, 2023
a70369c
Minor: use `Expr::alias` in a few places to make the code more concis…
alamb Nov 9, 2023
2e38489
Minor: Cleanup BuiltinScalarFunction::return_type() (#8088)
2010YOUY01 Nov 9, 2023
7570f34
Expose metrics from FileSinkExec impl of ExecutionPlan
thinkharderdev Nov 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,4 @@ If there are user-facing changes then we may require documentation to be updated

<!--
If there are any breaking changes to public APIs, please add the `api change` label.
-->
-->
2 changes: 1 addition & 1 deletion .github/workflows/dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "14"
node-version: "20"
- name: Prettier check
run: |
# if you encounter error, rerun the command below and commit the changes
Expand Down
16 changes: 2 additions & 14 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -139,19 +139,7 @@ jobs:
# test datafusion-sql examples
cargo run --example sql
# test datafusion-examples
cargo run --example avro_sql --features=datafusion/avro
cargo run --example csv_sql
cargo run --example custom_datasource
cargo run --example dataframe
cargo run --example dataframe_in_memory
cargo run --example deserialize_to_struct
cargo run --example expr_api
cargo run --example parquet_sql
cargo run --example parquet_sql_multiple_files
cargo run --example memtable
cargo run --example rewrite_expr
cargo run --example simple_udf
cargo run --example simple_udaf
ci/scripts/rust_example.sh
- name: Verify Working Directory Clean
run: git diff --exit-code

Expand Down Expand Up @@ -527,7 +515,7 @@ jobs:
rust-version: stable
- uses: actions/setup-node@v4
with:
node-version: "14"
node-version: "20"
- name: Check if configs.md has been modified
run: |
# If you encounter an error, run './dev/update_config_docs.sh' and commit
Expand Down
41 changes: 38 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ members = [
"datafusion/substrait",
"datafusion/wasmtest",
"datafusion-examples",
"docs",
"test-utils",
"benchmarks",
]
Expand All @@ -45,17 +46,50 @@ license = "Apache-2.0"
readme = "README.md"
repository = "https://github.com/apache/arrow-datafusion"
rust-version = "1.70"
version = "32.0.0"
version = "33.0.0"

[workspace.dependencies]
arrow = { version = "48.0.0", features = ["prettyprint"] }
arrow-array = { version = "48.0.0", default-features = false, features = ["chrono-tz"] }
arrow-buffer = { version = "48.0.0", default-features = false }
arrow-flight = { version = "48.0.0", features = ["flight-sql-experimental"] }
arrow-ord = { version = "48.0.0", default-features = false }
arrow-schema = { version = "48.0.0", default-features = false }
parquet = { version = "48.0.0", features = ["arrow", "async", "object_store"] }
sqlparser = { version = "0.38.0", features = ["visitor"] }
async-trait = "0.1.73"
bigdecimal = "0.4.1"
bytes = "1.4"
ctor = "0.2.0"
datafusion = { path = "datafusion/core" }
datafusion-common = { path = "datafusion/common" }
datafusion-expr = { path = "datafusion/expr" }
datafusion-sql = { path = "datafusion/sql" }
datafusion-optimizer = { path = "datafusion/optimizer" }
datafusion-physical-expr = { path = "datafusion/physical-expr" }
datafusion-physical-plan = { path = "datafusion/physical-plan" }
datafusion-execution = { path = "datafusion/execution" }
datafusion-proto = { path = "datafusion/proto" }
datafusion-sqllogictest = { path = "datafusion/sqllogictest" }
datafusion-substrait = { path = "datafusion/substrait" }
dashmap = "5.4.0"
doc-comment = "0.3"
env_logger = "0.10"
futures = "0.3"
half = "2.2.1"
indexmap = "2.0.0"
itertools = "0.11"
log = "^0.4"
num_cpus = "1.13.0"
object_store = { version = "0.7.0", default-features = false }
parking_lot = "0.12"
parquet = { version = "48.0.0", default-features = false, features = ["arrow", "async", "object_store"] }
rand = "0.8"
rstest = "0.18.0"
serde_json = "1"
sqlparser = { version = "0.39.0", features = ["visitor"] }
tempfile = "3"
thiserror = "1.0.44"
chrono = { version = "0.4.31", default-features = false }
url = "2.2"

[profile.release]
codegen-units = 1
Expand All @@ -74,3 +108,4 @@ opt-level = 3
overflow-checks = false
panic = 'unwind'
rpath = false

2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ Default features:
- `compression`: reading files compressed with `xz2`, `bzip2`, `flate2`, and `zstd`
- `crypto_expressions`: cryptographic functions such as `md5` and `sha256`
- `encoding_expressions`: `encode` and `decode` functions
- `parquet`: support for reading the [Apache Parquet] format
- `regex_expressions`: regular expression functions, such as `regexp_match`
- `unicode_expressions`: Include unicode aware functions such as `character_length`

Expand All @@ -59,6 +60,7 @@ Optional features:
- `simd`: enable arrow-rs's manual `SIMD` kernels (requires Rust `nightly`)

[apache avro]: https://avro.apache.org/
[apache parquet]: https://parquet.apache.org/

## Rust Version Compatibility

Expand Down
20 changes: 10 additions & 10 deletions benchmarks/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
[package]
name = "datafusion-benchmarks"
description = "DataFusion Benchmarks"
version = "32.0.0"
version = "33.0.0"
edition = { workspace = true }
authors = ["Apache Arrow <dev@arrow.apache.org>"]
homepage = "https://github.com/apache/arrow-datafusion"
Expand All @@ -34,20 +34,20 @@ snmalloc = ["snmalloc-rs"]

[dependencies]
arrow = { workspace = true }
datafusion = { path = "../datafusion/core", version = "32.0.0" }
datafusion-common = { path = "../datafusion/common", version = "32.0.0" }
env_logger = "0.10"
futures = "0.3"
log = "^0.4"
datafusion = { path = "../datafusion/core", version = "33.0.0" }
datafusion-common = { path = "../datafusion/common", version = "33.0.0" }
env_logger = { workspace = true }
futures = { workspace = true }
log = { workspace = true }
mimalloc = { version = "0.1", optional = true, default-features = false }
num_cpus = "1.13.0"
parquet = { workspace = true }
num_cpus = { workspace = true }
parquet = { workspace = true, default-features = true }
serde = { version = "1.0.136", features = ["derive"] }
serde_json = "1.0.78"
serde_json = { workspace = true }
snmalloc-rs = { version = "0.3", optional = true }
structopt = { version = "0.3", default-features = false }
test-utils = { path = "../test-utils/", version = "0.1.0" }
tokio = { version = "^1.0", features = ["macros", "rt", "rt-multi-thread", "parking_lot"] }

[dev-dependencies]
datafusion-proto = { path = "../datafusion/proto", version = "32.0.0" }
datafusion-proto = { path = "../datafusion/proto", version = "33.0.0" }
35 changes: 35 additions & 0 deletions ci/scripts/rust_example.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

set -ex
cd datafusion-examples/examples/
cargo fmt --all -- --check

files=$(ls .)
for filename in $files
do
example_name=`basename $filename ".rs"`
# Skip tests that rely on external storage and flight
# todo: Currently, catalog.rs is placed in the external-dependence directory because there is a problem parsing
# the parquet file of the external parquet-test that it currently relies on.
# We will wait for this issue[https://github.com/apache/arrow-datafusion/issues/8041] to be resolved.
if [ ! -d $filename ]; then
cargo run --example $example_name
fi
done
Loading