Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Zoltan2_OrderingScotch_MPI_4 timed out in CI testing on 3/16/2018 #2397

Closed
bartlettroscoe opened this issue Mar 16, 2018 · 8 comments
Closed

Comments

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Mar 16, 2018

@trilinos/zoltan2, @trilinos/framework

Description

The test Zoltan2_OrderingScotch_MPI_4 timed out at 10 minutes in the first CI build Linux-GCC-4.8.4-MPI_RELEASE_DEBUG_SHARED_PT_CI of the morning as shown at:

which shows:

Starting everything
UserInputForTests, Read: ./simple_ordering.mtx
UserInputForTests, Read: ./simple_ordering_coord.mtx
NumRows     = 25
NumNonzeros = 25
NumProcs = 4
Ordering does not support distributed matrices.
Ordering does not support distributed matrices.
Going to solve
Ordering does not support distributed matrices.

It looks like one of the 4 nodes may be hanging. This error look identical to the still-open issue #2131 where that test should have been disabled from the standard CI build on 1/8/2018.

The test is also failing in the other build Linux-GCC-4.8.4-MPI_RELEASE_DEBUG_SHARED_PT_CI_AAOP that enables Scotch:

However, there are no changes pulled in the CI iteration shown at:

that would seem to account for this failure or that would have re-enabled this test.

So how did this test get re-enabled?

Related Issues

@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Mar 16, 2018

It looks like the only place that this test is run is the CI builds that I set up. It looks like these test is not being run in the auto PR builds, or it would show up in the query:

@MicheldeMessieres
Copy link
Contributor

@bartlettroscoe I was testing this on the SEMS machine - but I thought I had just done local changes. I'm checking now but did not do any commit I was aware of.

@bartlettroscoe
Copy link
Member Author

Looking at the query:

it looks like this test never got disabled in the post-push CI build. Instead, looking at commit 5c01d87, it only got disabled in the checkin-test-sems.sh script.

Therefore, it looks like we need to disable this test in the post-push CI build as well.

But I think that unless Scotch gets enabled in the auto PR builds, then we should disable Scotch in this CI build since people are not using the checkin-test-sems.sh script to push as much now because they are being told to use the auto PR build.

@MicheldeMessieres
Copy link
Contributor

@bartlettroscoe I just recently got back up and running on that SEMS machine. Using ssh protocol you recommended worked for git and proxy exports @kddevin gave me restored outside access. We identified the hang issue (related to handling of 64 bit scotch) for that test and resolution should be coming soon.

@bartlettroscoe
Copy link
Member Author

I was testing this on the SEMS machine - but I thought I had just done local changes. I'm checking now but did not do any commit I was aware of.

@MicheldeMessieres, I don't know that I can explain why this latest set of builds show this failure since the prior CI iteration that tested Zoltan2 as part of Tpetra changes shown at:

passed. Those are the only changes to Trilinos in the last 24 hours that would explain this.

Very strange.

But given that the GCC 7.2.0 build this morning showed this test passing at:

shows that this problem does not exhibit itself with all compilers on all machines.

bartlettroscoe added a commit that referenced this issue Mar 16, 2018
This test hung in both the -GCC-4.8.4-MPI_RELEASE_DEBUG_SHARED_PT_CI build on
ceerws1113 and the Linux-GCC-4.8.4-MPI_RELEASE_DEBUG_SHARED_PT_CI_AAOP build
on crf450.  Therefore, this is not a fluke and must be diabled.  See #2397 for
mor details.

Build/Test Cases Summary
Enabled Packages: Zoltan2
Disabled Packages: PyTrilinos,Claps,TriKota
0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=100,notpassed=0 (21.89 min)
@bartlettroscoe
Copy link
Member Author

I pushed the commit 270c3cd which disables this test in the CI build (and therefore all Trilinos automated builds it would seem). But this test will still get enabled and run in all other builds of Trilinos that enable the Scotch TPL.

kyungjoo-kim pushed a commit to kyungjoo-kim/Trilinos that referenced this issue Mar 16, 2018
…rilinos#2397)

This test hung in both the -GCC-4.8.4-MPI_RELEASE_DEBUG_SHARED_PT_CI build on
ceerws1113 and the Linux-GCC-4.8.4-MPI_RELEASE_DEBUG_SHARED_PT_CI_AAOP build
on crf450.  Therefore, this is not a fluke and must be diabled.  See trilinos#2397 for
mor details.

Build/Test Cases Summary
Enabled Packages: Zoltan2
Disabled Packages: PyTrilinos,Claps,TriKota
0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=100,notpassed=0 (21.89 min)
@kddevin
Copy link
Contributor

kddevin commented Mar 19, 2018

Thank you for reporting this issue. It is now fixed in Trilinos/develop with PR #2415.
When you re-enable the test, please let us know if you see further problems.

bartlettroscoe added a commit that referenced this issue Mar 19, 2018
Issue should be resolved now.

Build/Test Cases Summary
Enabled Packages: Zoltan2
Disabled Packages: PyTrilinos,Claps,TriKota
0) MPI_RELEASE_DEBUG_SHARED_PT => passed: passed=101,notpassed=0 (10.10 min)
@bartlettroscoe
Copy link
Member Author

The test Zoltan2_OrderingScotch_MPI_4 was shown to be newly added and passing in the CI iteration this morning shown at:

(see the +1 superscript by 101 passing Zoltan2 tests) and shown more explicitly at:

This issue looks resolved. Closing as complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants