Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Reduce sort memory usage v2 by @richox #2134

Closed
wants to merge 5 commits into from

Conversation

yjshen
Copy link
Member

@yjshen yjshen commented Apr 2, 2022

Which issue does this PR close?

Closes #.

Rationale for this change

Another trial to reduce memory usage during the sort.
Mainly authored by @richox

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added ballista datafusion Changes in the datafusion crate labels Apr 2, 2022
@yjshen
Copy link
Member Author

yjshen commented Apr 2, 2022

cargo run --release --features "mimalloc" --bin tpch -- benchmark datafusion --iterations 3 --path ../tpch-parquet/ --format parquet --query 1 --batch-size 4096

Without this PR:

Running benchmarks with the following options: DataFusionBenchmarkOpt { query: 1, debug: false, iterations: 3, partitions: 2, batch_size: 4096, path: "../../tpch-parquet/", file_format: "parquet", mem_table: false, output_path: None }
Query 1 iteration 0 took 2851.7 ms and returned 6001214 rows
Query 1 iteration 1 took 2817.7 ms and returned 6001214 rows
Query 1 iteration 2 took 2735.9 ms and returned 6001214 rows
Query 1 avg time: 2801.75 ms

With this PR:

    Finished release [optimized] target(s) in 0.07s
     Running `target/release/tpch benchmark datafusion --iterations 3 --path ../tpch-parquet/ --format parquet --query 1 --batch-size 4096`
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: 1, debug: false, iterations: 3, partitions: 2, batch_size: 4096, path: "../tpch-parquet/", file_format: "parquet", mem_table: false, output_path: None }
Query 1 iteration 0 took 36483.0 ms and returned 6001214 rows
Query 1 iteration 1 took 36783.2 ms and returned 6001214 rows
Query 1 iteration 2 took 36400.4 ms and returned 6001214 rows
Query 1 avg time: 36555.52 ms

There appears to be a serious deterioration in performance 😅

@alamb
Copy link
Contributor

alamb commented Apr 2, 2022

This looks very cool -- thank you @yjshen and @richox for working on this

@yjshen
Copy link
Member Author

yjshen commented Apr 4, 2022

I am closing it for now because of the performance regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants