-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-38045][SS][TEST] More strict validation on plan check for stream-stream join unit test #35341
Conversation
…am-stream join unit test
cc. @tdas @zsxwing @viirya @xuanyuanking @cloud-fan Please take a look. Thanks! |
It doesn't fail with existing codebase (with SPARK-35703) since it doesn't read from bucketed source. I'll see whether I can do the same test against bucketed source. |
} | ||
} | ||
|
||
val numPartitions = spark.sqlContext.conf.getConf(SQLConf.SHUFFLE_PARTITIONS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on the precondition that the number of state partitions is same as the number of shuffle partitions. If the state can maintain its own number of partitions, we will have to change the distribution requirement of stateful operator and hence need to change this.
I tried with managed table with file source which supports bucketing, but looks like file stream source does not pick up the bucket info even it goes through the managed table, hence the output partitioning of source is unknownpartitioning.
If we want to test with SPARK-35703, we may need to have a source supporting bucket scan on streaming. I'm not sure we have it in built-in source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is a kind of follow-up, I created a new JIRA for this because the previous one is already released at 3.2.0 and this one will be resolved as 3.2.2. This will give us a better traceability.
Let me merge this first because the following is worth to have another JIRA. Thank you, @HeartSaVioR and all.
|
Could you make a backporting PR to branch-3.2, @HeartSaVioR ? There is a small conflict in branch-3.2. |
Thanks for reviewing and merging! I'll submit a PR against 3.2 sooner than later. |
…am-stream join unit test This PR is a follow-up of SPARK-35693 to enhance the unit test on stream-stream join to be more strict on plan check. We would like to be more strict on plan check so that requirement of distribution against stream-stream join is fulfilled. No, test only. Modified test passed. Closes apache#35341 from HeartSaVioR/SPARK-35693-followup. Authored-by: Jungtaek Lim <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(just for better traceability) #35347 for 3.2 |
…am-stream join unit test ### What changes were proposed in this pull request? This PR is a follow-up of SPARK-35693 to enhance the unit test on stream-stream join to be more strict on plan check. ### Why are the changes needed? We would like to be more strict on plan check so that requirement of distribution against stream-stream join is fulfilled. ### Does this PR introduce _any_ user-facing change? No, test only. ### How was this patch tested? Modified test passed. Closes apache#35341 from HeartSaVioR/SPARK-35693-followup. Authored-by: Jungtaek Lim <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR is a follow-up of SPARK-35693 to enhance the unit test on stream-stream join to be more strict on plan check.
Why are the changes needed?
We would like to be more strict on plan check so that requirement of distribution against stream-stream join is fulfilled.
Does this PR introduce any user-facing change?
No, test only.
How was this patch tested?
Modified test passed.