FIX : some benchmarks are failing #15367

getChan · 2025-03-23T16:24:08Z

Which issue does this PR close?

Closes distinct_query_sql benchmark is failing #15213 .

Rationale for this change

It is not certain, but it seems that plan creation and collect() should share the same runtime.. It is presumed that the issue occurred because RepartitionExec lazily polls within the runtime.

I will add more details if I find anything additional.

What changes are included in this PR?

Move the Runtime::new() intobench_function
Ensure that plan creation (ctx.sql()) and collect() share the same runtime.

Are these changes tested?

yes. below test are succeded

cargo bench -p datafusion --bench topk_aggregate
cargo bench -p datafusion --bench distinct_query_sql

Are there any user-facing changes?

No.

alamb

Thanks for this @getChan

alamb · 2025-03-25T20:59:54Z

datafusion/core/benches/distinct_query_sql.rs

-        |b| b.iter(|| run(distinct_trace_id_100_partitions_100_000_samples_limit_100.0.clone(),
-                                   distinct_trace_id_100_partitions_100_000_samples_limit_100.1.clone())),
+        |b| b.iter(|| {
+            let rt = Runtime::new().unwrap();


I think this means that the benchmark will include the time to create each tokio runtime (with a bunch of threads, etc)

To avoid this I think you can create the runtime once, and then use it for each tieration:

let rt = Runtime::new().unwrap(); c.bench_function( format!("distinct query with {} partitions and {} samples per partition with limit {}", partitions, samples, limit).as_str(), |b| b.iter(|| {

Got it. Changed the runtime to be shared between loops.

However, it seems that other benchmark codes still have runtime creation within the iteration.

github-actions bot added the core Core DataFusion crate label Mar 23, 2025

alamb reviewed Mar 25, 2025

View reviewed changes

getChan added 3 commits March 27, 2025 00:33

distinct_query_sql, topk_aggregate

69b8e4a

cargo clippy

71eefa7

cargo fmt

c36e781

getChan force-pushed the fix-repartition-bench-bug branch from c7414b2 to c36e781 Compare March 26, 2025 15:37

github-actions bot removed functions Changes to functions implementation datasource Changes to the datasource crate labels Mar 26, 2025

share runtime

0365d70

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX : some benchmarks are failing #15367

FIX : some benchmarks are failing #15367

getChan commented Mar 23, 2025 •

edited

Loading

alamb left a comment

alamb Mar 25, 2025

getChan Mar 26, 2025

FIX : some benchmarks are failing #15367

Are you sure you want to change the base?

FIX : some benchmarks are failing #15367

Conversation

getChan commented Mar 23, 2025 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

alamb Mar 25, 2025

Choose a reason for hiding this comment

getChan Mar 26, 2025

Choose a reason for hiding this comment

getChan commented Mar 23, 2025 •

edited

Loading