-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-44239][SQL][FOLLOWUP] Do not disable vector memory optimization when hugeVectorThreshold=0 to align its document #47988
Conversation
Also cc @cloud-fan who reviewed and merged #41782 / SPARK-44239 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions, @wankunde .
- The AS-IS PR title,
Update default hugeVectorThreshold from 0 to -1 in code
, could be a little misleading because the default value means the following. And, this is also acode
. Do you think you can revise the PR title more intuitively?
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Lines 506 to 512 in 2ed6c3e
buildConf("spark.sql.inMemoryColumnarStorage.hugeVectorThreshold") | |
.doc("When the required memory is larger than this, spark reserves required memory * " + | |
s"${VECTORIZED_HUGE_VECTOR_RESERVE_RATIO.key} memory next time and release this column " + | |
s"vector memory before reading the next batch rows. -1 means disabling the optimization.") | |
.version("4.0.0") | |
.bytesConf(ByteUnit.BYTE) | |
.createWithDefault(-1) |
- Although it seems to be difficult to verify, do you think we can add a test coverage to verify your new code and prevent a future regression at this part?
- if (hugeVectorThreshold > 0 && capacity > hugeVectorThreshold) {
+ if (hugeVectorThreshold > -1 && capacity > hugeVectorThreshold) {
Hi, @dongjoon-hyun
When setting spark.sql.inMemoryColumnarStorage.hugeVectorThreshold=0, the code does not work as the doc |
@@ -72,7 +72,7 @@ public void reset() { | |||
numNulls = 0; | |||
} | |||
|
|||
if (hugeVectorThreshold > 0 && capacity > hugeVectorThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= 0
means we treat all vectors as huge vectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but = 0
is not recommended, just keep code and documentation consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then shall we update the doc instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the doc will result in code part change as other places test if hugeVectorThreshold < 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see
Merged to master. Thank you @wankunde and all |
What changes were proposed in this pull request?
Minor fix: Update default hugeVectorThreshold from 0 to -1 in code to match the documentation.
Keep consistent with the documentation
Why are the changes needed?
Keep code and documentation consistent
Does this PR introduce any user-facing change?
No
How was this patch tested?
Exists UT
Was this patch authored or co-authored using generative AI tooling?
No