Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-44239][SQL][FOLLOWUP] Do not disable vector memory optimization when hugeVectorThreshold=0 to align its document #47988

Closed
wants to merge 1 commit into from

Conversation

wankunde
Copy link
Contributor

@wankunde wankunde commented Sep 4, 2024

What changes were proposed in this pull request?

Minor fix: Update default hugeVectorThreshold from 0 to -1 in code to match the documentation.
Keep consistent with the documentation

Why are the changes needed?

Keep code and documentation consistent

Does this PR introduce any user-facing change?

No

How was this patch tested?

Exists UT

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Sep 4, 2024
@yaooqinn yaooqinn changed the title [SPARK-44239][SQL][MINOR] Update default hugeVectorThreshold from 0 to -1 in code [SPARK-44239][SQL] Update default hugeVectorThreshold from 0 to -1 in code Sep 4, 2024
@yaooqinn
Copy link
Member

yaooqinn commented Sep 4, 2024

Also cc @cloud-fan who reviewed and merged #41782 / SPARK-44239

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions, @wankunde .

  • The AS-IS PR title, Update default hugeVectorThreshold from 0 to -1 in code, could be a little misleading because the default value means the following. And, this is also a code. Do you think you can revise the PR title more intuitively?

buildConf("spark.sql.inMemoryColumnarStorage.hugeVectorThreshold")
.doc("When the required memory is larger than this, spark reserves required memory * " +
s"${VECTORIZED_HUGE_VECTOR_RESERVE_RATIO.key} memory next time and release this column " +
s"vector memory before reading the next batch rows. -1 means disabling the optimization.")
.version("4.0.0")
.bytesConf(ByteUnit.BYTE)
.createWithDefault(-1)

  • Although it seems to be difficult to verify, do you think we can add a test coverage to verify your new code and prevent a future regression at this part?
- if (hugeVectorThreshold > 0 && capacity > hugeVectorThreshold) {
+ if (hugeVectorThreshold > -1 && capacity > hugeVectorThreshold) {

@wankunde wankunde changed the title [SPARK-44239][SQL] Update default hugeVectorThreshold from 0 to -1 in code [SPARK-44239][SQL] Update default hugeVectorThreshold from 0 to -1 Sep 5, 2024
@yaooqinn
Copy link
Member

yaooqinn commented Sep 5, 2024

Hi, @dongjoon-hyun

could be a little misleading because the default value means the following. And, this is also a code

When setting spark.sql.inMemoryColumnarStorage.hugeVectorThreshold=0, the code does not work as the doc

@yaooqinn yaooqinn changed the title [SPARK-44239][SQL] Update default hugeVectorThreshold from 0 to -1 [SPARK-44239][SQL][FOLLOWUP] Do not disable vector memory optimization when hugeVectorThreshold=0 to align its document Sep 5, 2024
@@ -72,7 +72,7 @@ public void reset() {
numNulls = 0;
}

if (hugeVectorThreshold > 0 && capacity > hugeVectorThreshold) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= 0 means we treat all vectors as huge vectors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but = 0 is not recommended, just keep code and documentation consistent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then shall we update the doc instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the doc will result in code part change as other places test if hugeVectorThreshold < 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see

@yaooqinn yaooqinn closed this in e9a6230 Sep 6, 2024
@yaooqinn
Copy link
Member

yaooqinn commented Sep 6, 2024

Merged to master. Thank you @wankunde and all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants