[SPARK-50080][SQL][TESTS] Add benchmark cases for parquet adaptive bloom filter in BloomFilterBenchmark #48609

yaooqinn · 2024-10-23T02:24:30Z

What changes were proposed in this pull request?

Parquet's AdaptiveBlockSplitBloomFilter is a technique for generating a bloom filter with the optimal bit size according to the number of distinct real data values. It may not come at no cost because it uses multiple BloomFilter candidates at runtime, which could increase CPU usage or time.

This pull request adds benchmark cases to compare with those that use the default BloomFilter size.

Why are the changes needed?

Improvement benchmark coverage for common user-orient features from parquet datasource

Does this PR introduce any user-facing change?

no

How was this patch tested?

benchmarking golden files attached

Was this patch authored or co-authored using generative AI tooling?

no

yaooqinn · 2024-10-23T06:48:58Z

cc @dongjoon-hyun @LuciferYang thanks

LuciferYang

+1, LGTM

HyukjinKwon · 2024-10-23T07:33:18Z

Merged to master.

dongjoon-hyun

+1, late LGTM.

yaooqinn · 2024-10-25T03:12:31Z

Thank you @HyukjinKwon @dongjoon-hyun @LuciferYang

yaooqinn added 2 commits October 22, 2024 19:41

benchmarking adaptive_bf

8c0e686

benchmarking adaptive_bf

61b0386

github-actions bot added the SQL label Oct 23, 2024

LuciferYang approved these changes Oct 23, 2024

View reviewed changes

HyukjinKwon approved these changes Oct 23, 2024

View reviewed changes

HyukjinKwon closed this in 2cb7a16 Oct 23, 2024

dongjoon-hyun reviewed Oct 24, 2024

View reviewed changes

yaooqinn deleted the SPARK-50080 branch October 25, 2024 03:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-50080][SQL][TESTS] Add benchmark cases for parquet adaptive bloom filter in BloomFilterBenchmark #48609

[SPARK-50080][SQL][TESTS] Add benchmark cases for parquet adaptive bloom filter in BloomFilterBenchmark #48609

yaooqinn commented Oct 23, 2024

yaooqinn commented Oct 23, 2024

LuciferYang left a comment

HyukjinKwon commented Oct 23, 2024

dongjoon-hyun left a comment

yaooqinn commented Oct 25, 2024

[SPARK-50080][SQL][TESTS] Add benchmark cases for parquet adaptive bloom filter in BloomFilterBenchmark #48609

[SPARK-50080][SQL][TESTS] Add benchmark cases for parquet adaptive bloom filter in BloomFilterBenchmark #48609

Conversation

yaooqinn commented Oct 23, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

yaooqinn commented Oct 23, 2024

LuciferYang left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Oct 23, 2024

dongjoon-hyun left a comment

Choose a reason for hiding this comment

yaooqinn commented Oct 25, 2024