Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28770][CORE][TESTS]Ignore SparkListenerStageExecutorMetrics in testApplicationReplay test #25659

Closed

Conversation

bzhaoopenstack
Copy link
Contributor

What changes were proposed in this pull request?

This patch fix the testApplicationReplay test in ReplayListenerSuite.
We compare the event log json except SparkListenerStageExecutorMetrics.

Why are the changes needed?

As for potentail different log events Issue, we found this can be reproduced in X86. This due to the SparkListenerStageExecutorMetrics events may be logged in different order. So the test would be fail in some time. But this is a particular issue for Arm platform. For more details, please see
https://issues.apache.org/jira/browse/SPARK-28770

Does this PR introduce any user-facing change?

No

How was this patch tested?

This is existing UTs, we can test it with the same way like we usually do.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@bzhaoopenstack
Copy link
Contributor Author

@wypoon @srowen
Hi guys, cloud you please have a look at this patch? Thank you very much

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of ignoring them, is there any natural sort ordering you can impose to make the comparison valid? does the test fail right now?

@wypoon
Copy link
Contributor

wypoon commented Sep 4, 2019

SparkListenerStageExecutorMetrics were introduced in SPARK-23429 (#21221). By design, executor metrics update events are not logged by the EventLoggingListener. Instead, the listener keeps track of the per stage peaks for any of the executors and the driver for which it has received metrics. On stage completion, the peaks for the stage are logged in SparkListenerStageExecutorMetrics events for each of these executors and driver.
Since executor metrics update events are not logged in the event log, they do not get replayed. Thus the listener for the replay never sees metrics updates. It is therefore valid to exclude SparkListenerStageExecutorMetrics events from both the original and the replay for the purpose of comparison.

However, instead of excluding all SparkListenerStageExecutorMetrics events from both the original EventLoggingListener and the replay listener, we can have a finer-grained fix, which I have proposed in #25673 for comparison. It should be sufficient to exclude any SparkListenerStageExecutorMetrics events for the driver. This is because with SPARK-26329 (#23767), executor metrics are also sent in task end events (which do get replayed), so the EventLoggingListener always receives metrics for the executors (just not necessarily for the driver), and thus SparkListenerStageExecutorMetrics events for the executors always get logged.

// If we are logging stage executor metrics, there is a bulk call to logEvent with
// SparkListenerStageExecutorMetrics events via a Map.foreach. The Map.foreach bulk
// operation may not log the events with the same order. So here we should not compare
// SparkListenerStageExecutorMetrics here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is based on my earlier analysis in the JIRA, which is not quite correct.

@HeartSaVioR
Copy link
Contributor

#25673 supersedes this and addresses root issue. Could you take a look at #25673 and test with ARM CI, and close this PR if #25673 works? Thanks in advance!

@bzhaoopenstack
Copy link
Contributor Author

Thanks very much, guys. #25673 works well on ARM.
Also thanks to @wypoon, you give us a very clear failure reason we face. Thanks

@wypoon
Copy link
Contributor

wypoon commented Sep 5, 2019

@bzhaoopenstack, thanks for the verification.

dongjoon-hyun pushed a commit that referenced this pull request Nov 4, 2023
### What changes were proposed in this pull request?
This pr upgrade Apache Arrow from 13.0.0 to 14.0.0.

### Why are the changes needed?
The Apache Arrow 14.0.0 release brings a number of enhancements and bug fixes.
‎
In terms of bug fixes, the release addresses several critical issues that were causing failures in integration jobs with Spark([GH-36332](apache/arrow#36332)) and problems with importing empty data arrays([GH-37056](apache/arrow#37056)). It also optimizes the process of appending variable length vectors([GH-37829](apache/arrow#37829)) and includes C++ libraries for MacOS AARCH 64 in Java-Jars([GH-38076](apache/arrow#38076)).
‎
The new features and improvements focus on enhancing the handling and manipulation of data. This includes the introduction of DefaultVectorComparators for large types([GH-25659](apache/arrow#25659)), support for extended expressions in ScannerBuilder([GH-34252](apache/arrow#34252)), and the exposure of the VectorAppender class([GH-37246](apache/arrow#37246)).
‎
The release also brings enhancements to the development and testing process, with the CI environment now using JDK 21([GH-36994](apache/arrow#36994)). In addition, the release introduces vector validation consistent with C++, ensuring consistency across different languages([GH-37702](apache/arrow#37702)).
‎
Furthermore, the usability of VarChar writers and binary writers has been improved with the addition of extra input methods([GH-37705](apache/arrow#37705)), and VarCharWriter now supports writing from `Text` and `String`([GH-37706](apache/arrow#37706)). The release also adds typed getters for StructVector, improving the ease of accessing data([GH-37863](apache/arrow#37863)).

The full release notes as follows:
- https://arrow.apache.org/release/14.0.0.html

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43650 from LuciferYang/arrow-14.

Lead-authored-by: yangjie01 <yangjie01@baidu.com>
Co-authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants