-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HDFS-16740. Mini cluster test flakiness #4835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@steveloughran You mentioned that the test system could use some attention. The set of changes in this pull request, using an updated MiniDFSCluster that leverages temporary folders and ensuring that the cluster is shutdown at the end of the test, could be applied to the remaining tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really needs hdfs reviewer i'm afraid -but i've had a look
We can't add junit dependencies to the miniclusters as they get used in different places outside our codebase and adding a new dependency could break them. sorry
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRollingUpgrade.java
Show resolved
Hide resolved
...fs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/MiniJournalCluster.java
Outdated
Show resolved
Hide resolved
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy. yes, it's a bit more verbose -but less likely to cause problems for others.
I still think we need an hdfs developer to review though -can you ask on hdfs-dev list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good. Appreciate your efforts on improving this! This would be really helpful!
...fs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommissionWithStriped.java
Outdated
Show resolved
Hide resolved
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRollingUpgrade.java
Show resolved
Hide resolved
indeed, |
From what I've seen, many of the timeouts are related to:
This fix is focussed on both issues, as it uses a In addition, I've been submitting PRs that address some of the threading issues (e.g. HDFS-16904 and HADOOP-18279. |
Thanks @snmvaughan for you nice idea, comments from @xinglin seems make sense ,Could you review and resubmit new changes ? |
💔 -1 overall
This message was automatically generated. |
There are 5 other test classes which create a
|
💔 -1 overall
This message was automatically generated. |
Add a constructor to the mini cluster constructors and builders that takes a TemporaryFolder. This encourages the use of managed temporary folders that are unique to the test and automatically cleaned up. The field is named "baseDir" to communicate that this folder acts as the base directory for test data. These subset of tests recently demonstrated flaky behavior for multiple pull requests. Switch to using a temporary folder managed by the test class, and ensure that the test cluster is shutdown using either @after or try-with-resources. Using the new constructor encourages the consistent use of the managed temporary folder. Allow the same baseDir to be provided multiple times, simplifying cluster restarts. This simplifies these tests as the use of the same TemporaryFolder will always work (even for restarts) This change also reduces the number of tests that make assumptions about the mini clusters using encapsulation. These changes avoid configurations that directly use HDFS_MINIDFS_BASEDIR.
Replace TemporaryFolder parameter with File parameter by using ".getRoot()". Replace JUnit Assert.fail with AssertionError.
The outcome when compared to FileSystem.get is the same, but this makes it clear that it is a LocalFileSystem instance.
495b92d
to
e9103cf
Compare
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -couple of comments
...fs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/MiniJournalCluster.java
Show resolved
Hide resolved
...hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/MiniQJMHACluster.java
Show resolved
Hide resolved
...ect/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! This improvement is much needed!
I still remember that TestRollingUpgrade failed frequently/non-deterministically for my previous PRs.
@steveloughran Is there anything you are waining for or are we ready to merge this PR? I want it to be merged soon! Thanks, |
@steveloughran/ @goiri could we merge this PR? Thanks, |
I tested this a bit and it seems safe. |
Thanks @goiri for moving this PR forward! |
Thanks! |
* Steve VaughanSteve Vaughan <email@stevevaughan.me> Co-authored-by: Tom McCormick <tmccormi@linkedin.com>
Contributed by Steve Vaughan <email@stevevaughan.me> Co-authored-by: Tom McCormick <tmccormi@linkedin.com>
Description of PR
Mini clusters used during HDFS unit tests are reporting test failures that do not appear to be directly related to submitted changes. The failures are the result of either interactions between tests run in parallel, or tests which share common disk space for tests. In all cases, the tests can be run individually serially without any errors. Addressing this issue will simplify future submissions by eliminating the confusion introduced by these unrelated test failures.
We can apply lessons recently from TestRollingUpgrade, which was recently patched to unblock a recent submission. The fixes involved changing the HDFS configuration to use temporary disk space for each individual tests, and using try-with-resources to ensure that clusters were shutdown cleanly.
Add a constructor to the mini cluster constructors and builders that can take a directory from a TemporaryFolder. This encourages the use of managed temporary folders that are unique to the test and automatically cleaned up. The field is named "baseDir" to communicate that this folder acts as the base directory for test data. Using the new constructor encourages the consistent use of the managed temporary folder.
Allow the same baseDir to be provided multiple times, simplifying the coding of cluster restarts. This simplifies these tests as the use of the same TemporaryFolder will always work (even for restarts)
This change also reduces the number of tests that make assumptions about the mini clusters using encapsulation. These changes avoid configurations that directly use HDFS_MINIDFS_BASEDIR.
How was this patch tested?
Both Maven Surefire and an IDE were used to demonstrate that the tests operate as expected.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?