-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a config param and session properties for setting Parquet's max read block row count #15474
Create a config param and session properties for setting Parquet's max read block row count #15474
Conversation
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetReaderConfig.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/ParquetReaderOptions.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/ParquetReaderOptions.java
Outdated
Show resolved
Hide resolved
What's wrong with using Can you also share some info about the regressing query. I somehow don't see how this could significantly increase inputDataSize, as it tackles the reader's output. |
|
34df627
to
ebf1ceb
Compare
6b1fa2a
to
b916c6b
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetReaderConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveSessionProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetReaderConfig.java
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/ParquetReaderOptions.java
Show resolved
Hide resolved
b916c6b
to
b9b4df7
Compare
b9b4df7
to
f12e4e5
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveSessionProperties.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveSessionProperties.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/ParquetReaderOptions.java
Show resolved
Hide resolved
f12e4e5
to
34b737d
Compare
Includes config param and session properties for Hive, Iceberg, and Deltalake. In some cases, the previously hard-coded 8k value is much worse than, for instance, 1k. A larger value may spare some CPU time but can also lead to more lazy block decoding, inputDataSize increase, and significant query performance degradation. In the future, this param should be adaptive.
34b737d
to
5ee8982
Compare
Description
Lately (#15257) we increased the hard-coded value from 1k to 8k, but in some cases, 8k is much worse than 1k. A larger value may spare some CPU time but can also lead to more lazy block decoding, inputDataSize increase, and significant query performance degradation.
In the future, this param should be adaptive.
The following query took 36.83s on 403 and now it takes 59.76s (when max vector length is 8k):
SELECT count(c_description60) FROM storesalesflat_mixed WHERE ss_customer_sk>64992484;
The table is based on
tpcds
and contains 10,678,104,288 rows:Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: