-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thyra: EpetraOperatorWrapper_UnitTests.cpp cannot open "Trilinos_Util_CrsMatrixGallery.h" blocking all PRs changing Tpetra #10842
Comments
That is very interesting. That error is also being reported in some PR builds as described in #10823 (comment) and #10823 (comment). But when I tried to reproduce the error in #10823 (comment) and #10823 (comment) I could not with the tip of 'develop'. Can you try using a more recent version of 'develop' and see what happens? In the meantime, I will see if I can reproduce this error on 'vortex' (where the PR tester is reporting an error in some PR builds). |
@bartlettroscoe yes, prepping the build now |
I currently don't have access to 'blake' so I can't reproduce this failure. I will try on 'vortex' as per above. |
@bartlettroscoe oop, I have to hold off a bit, "no space left on device" message on blake. Will launch the build once disk space is cleared (I contacted an admin for help with this) |
Can "catastrophic error" type issues be triggered when disk space is low? If so, I suppose this could be a potential culprit for the intermittent issues with PR testing if the failures occur consistently on the same node(s)? |
But why would it report a specific header file being missing? When you run out of disk space (with no swap space) usually the compiler aborts with no diagnostic feedback. |
Yeah, I suppose it wouldn't make sense to report a specific header file being missing (I had forgotten the typical abort behavior when out of space) |
@ndellingwood, if you can rerun that exact same build for the same repo version on 'blake' without running out if disk space, then that would be very good to know. That might explain why so many PR builds are failing with that error. But at face value, that error makes no sense. Those files have not changed in a long time as shown by the git commands:
and
So those files have not changed or moved in 7 years. The only recent changes to Thyra or TriUtils are:
Very unlikely for those changes to impact the compile of the file Also, PR builds passed when those changes were merged to 'develop'. |
Note the matching internal issue TRILINOSHD-150. |
@bartlettroscoe I reran the build a couple times, first was with -j20 to reproduce the error, then again with -j1 to see if that made a difference before posting the issue; in both cases the "catastrophic error" occurred. I'll rebuild and update after the system's "No space left on device" issue is resolved (I encountered the "No space left on device" message when running |
May want to check that Can also look at |
I gave my best attempt to reproduce the build error:
for the PR build:
on 'vortex' being consistently reported PR #10808 shown here with the exact versions claimed in the last few PR iterations here and here which are:
and I got a successful build of the executable and ran the test successfully
There is just nothing more that I can do to try to reproduce this failure. Attempt to reproduce build error on 'vortex' reported in PR #10808 (failed) (click to expand)Trying to exactly reproduce the build error for Thyra EpetraOperatorWrapper_UnitTests.cpp missing "Trilinos_Util_CrsMatrixGallery.h" on 'vortex' for the PR #10808 that consistantly reports the failure:
for the build:
on 'vortex' shown here. The last two PR iterations in PR #10803 here and here reported the exact same commits for the two branches being merged together locally:
So let's create a temp branch that merges those exact two commits togther and see if I can reproduce the build error. So on 'vortex' I do:
Interesting, the version on the branch 48a70bc is an ancestor of commit e8a9b49 so no merge was needed. Now let's try and run a build of Thyra:
And I was able to run this with:
So the build and test passed. Darn, that did not reproduce the failure either. |
FYI: I just realized that the error:
says that it cannot open the file, not that it can't find it. Perhaps a I am just out of ideas. |
@jjellio thanks for the suggestion, /tmp is pretty empty but /home is quite low on space (96% full) @bartlettroscoe thanks for the update and triage, I'll update tomorrow when I can rebuild with more disk space, hopefully I'll have some useful additional diagnostic info to share |
Note my other two failed attempts to reproduce this build error with other compilers and other configurations in #10823 (comment) and #10823 (comment) as reported on CDash (see here). |
Same failure with builds I launched on Blake overnight :( |
Okay, I will request an account on 'blake' and I will try to reproduce there once I get access. |
@ndellingwood Oh sweet. I am able to reproduce the error on blake using your cmake script!!! |
@bartlettroscoe Sure enough, the compile line is missing the "include" of triutils: VERBOSE=1 make
TriUtils is a required test dependency of adapters/epetra. One thing I notice is that other packages (aztecoo) are using this syntax:
whereas thyra is using
|
…rilinos#10842) Mainly pulling in this TriBITS 'master' snapshot update to address trilinos#10842.
This is NOT fixed. Reopening. |
With the merge of PR #10813, this should be fixed. If you look at the PR builds for PR #10775 that just started running after the merge of PR #10813, they seems to show these errors are gone now looking at CDash here. If look at the history for the build Therefore, I think this is ready to close. But I will leave "In review" for a few days just to be sure. |
NOTE: As shown in this query over last 2 days all the build errors on 'vortex' caused by this problem have disappeared (except for real failures for PR #10829), including for the PRs #10808 (see this query), #10802 (see this query), and #10775 (see this query). That is sufficient evidence to close this issue. |
Bug Report
@trilinos/thyra
Internal issues:
Description
Compilation of Trilinos fails on SKX architecture (Serial and OpenMP backends, Blake testbed) in the configuration tested below with output message (including compilation line):
I haven't attempted bisection yet to determine PR or SHA when breakage began
Steps to Reproduce
cat
from my script below (Blake testbed) disables all tests but thyra, contains boiler plate for tpls and explicitly enables serial backend for kokkos and tpetraCross-referencing #10823 (comment) where similar issues were mentioned
The text was updated successfully, but these errors were encountered: