-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the TriBITS Netcdf and MPI TPLs #11175
Improve the TriBITS Netcdf and MPI TPLs #11175
Conversation
this is required to fix TriBITSPub/TriBITS#534
when this CMake cache variable is set to a true value, the TriBITS Netcdf TPL code will only look for externally installed CMake config files for CMake packages called netCDF and possibly hdf5. Such files are installed by the latest versions of NetCDF and HDF5. This effectively allows loading NetCDF and its HDF5 dependency using the most modern CMake approach. This is needed to resolve TriBITSPub/TriBITS#533
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Using Repos:
Pull Request Author: ibaned |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see if we can accomplish this without mass duplication if what is implemented in TriBITS. Please see above comment.
tribits_tpl_find_include_dirs_and_libraries(MPI) | ||
tribits_tpl_find_include_dirs_and_libraries( | ||
MPI | ||
REQUIRED_HEADERS mpi.h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it breaks the configure of Trilinos, see here, for example showing:
Processing enabled external package/TPL: MPI (enabled explicitly, disable with -DTPL_ENABLE_MPI=OFF)
-- MPI_LIBRARY_NAMES=''
-- Searching for headers in MPI_INCLUDE_DIRS=''
-- Searching for a header file in the set "mpi.h":
-- Searching for header 'mpi.h' ...
-- ERROR: Could not find a header file in the set "mpi.h"
-- ERROR: Could not find the include directories for TPL 'MPI'!
-- TIP: If the TPL 'MPI' is on your system then you can set:
-DMPI_INCLUDE_DIRS='<dir0>;<dir1>;...'
to point to directories where these header files may be found.
Or, just set:
-DTPL_MPI_INCLUDE_DIRS='<dir0>;<dir1>;...'
to point to the include directories which will bypass any search for
header files and these include directories will be used without
question in the build. (But this will result in a build-time error
obviously if the necessary header files are not found in these
include directories.)
-- ERROR: Failed finding all of the parts of TPL 'MPI' (see above), Aborting!
-- NOTE: The find module file for this failed TPL 'MPI' is:
/scratch/trilinos/jenkins/ascic115/workspace/Trilinos_PR_clang-10.0.0/Trilinos/cmake/tribits/core/std_tpls/FindTPLMPI.cmake
which is pointed to in the file:
/scratch/trilinos/jenkins/ascic115/workspace/Trilinos_PR_clang-10.0.0/Trilinos/TPLsList.cmake
TIP: Even though the TPL 'MPI' was explicitly enabled in input,
it can be disabled with:
-DTPL_ENABLE_MPI=OFF
which will disable it and will recursively disable all of the
downstream packages that have required dependencies on it.
When you reconfigure, just grep the cmake stdout for 'MPI'
and then follow the disables that occur as a result to see what impact
this TPL disable has on the configuration of Trilinos.
CMake Error at cmake/tribits/core/package_arch/TribitsProcessEnabledTpl.cmake:148 (message):
ERROR: TPL_MPI_NOT_FOUND=TRUE, aborting!
Call Stack (most recent call first):
cmake/tribits/core/package_arch/TribitsGlobalMacros.cmake:1445 (tribits_process_enabled_tpl)
cmake/tribits/core/package_arch/TribitsProjectImpl.cmake:197 (tribits_process_enabled_tpls)
cmake/tribits/core/package_arch/TribitsProject.cmake:92 (tribits_project_impl)
CMakeLists.txt:109 (TRIBITS_PROJECT)
-- Configuring incomplete, errors occurred!
See also "/scratch/trilinos/jenkins/ascic115/workspace/Trilinos_PR_clang-10.0.0/pull_request_test/CMakeFiles/CMakeOutput.log".
The question is, why can't your MPI just supply the compiler wrappers and mpiexec
? There should be no need to specifically look for mpi.h
. The include dirs should be automatically handled by the MPI compiler wrappers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I didn't see the question at the end of this comment, didn't mean to ignore it. I guess philosophically I think that in a full-modern-CMake world compiler wrappers should not be necessary and so I'm trying as hard as possible to do the whole build without them, however I may soon have to give in and implement them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ibaned, you can use raw compilers with MPI. See "Configuring to build using raw compilers and flags/libraries:" at:
This has been supported and used with Trilinos for at least 14 years.
# instead of using tribits_extpkg_create_imported_all_libs_target_and_config_file, | ||
# we will do what it does ourselves because we need to bring in two | ||
# inner find packages (netCDF and hdf5) because netCDF does not properly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really unfortunate chuck of duplication and would be a terrible precedent to set. Might we consider having the TriBITS Netcdf TPL depend on the TriBITS HDF5 TPL and then have the Netcdf TPL depend on the HDF5 TPL as I tried to outline in TriBITSPub/TriBITS#533 (comment)? I think that would be very compact and you could essentially accomplish what this large amount of duplicated code is doing. Is that something you would be willing to consider? I can add a commit to this branch that attempts that (but I don't have a way to test it with Netcdf_ALLOW_MODERN=ON
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm fine with moving in that direction. If you'd be willing to add a commit to this branch that would be super helpful, and I could test the modern option on my end.
@ibaned, for an easier way to run the Trilinos GenConfig PR builds locally, try out: That way, one can debug these issues without having to wait in the Trilinos PR testing system (and bog down builds for other PRs that are ready to merge). |
…ITS#533) This makes Netcdf depend on HDF5 using modern CMake targets. ToDo: Finish this commit message.
@ibaned, as of the commit I can't fully test this because I don't have the updated versions of HDF5 and Netcdf installed on any machine that I have access to. Please give this a test and let me know what happens (or give me access to a machine with the updated HDF5 and Netcdf installed so I can test this myself). |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Using Repos:
Pull Request Author: ibaned |
@bartlettroscoe : I cancelled the jobs running on the problematic nodes reported in #10999 (comment). Please try adding a RETEST label. CC: @DaveBeCoding |
Thanks @bartlettroscoe ! I confirmed that the MPI part works for me, let me check the improvements to HDF5 dependence and report back |
I've had to make a couple of changes for the HDF5 and Netcdf setup to work in modern mode, I'll commit these here when the build completes |
1. set the minimum version to the one I'm testing with 2. use HDF5_EXPORT_LIBRARIES since the targets defined are hard to predict
1. don't expect HDF5 target to be set. in the general case, Netcdf may not use HDF5, but also HDF5 has no predictable catch-all target 2. TPL_Netcdf_NOT_FOUND still has to be manually set to FALSE
Okay, with the latest changes in this branch this works in my use case with the MODERN option on and custom MPI! |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
|
It looks like the PR build for this still thinks that MPI has REQUIRED_HEADERS even though you reverted that change? is there some mechanism for clearing the CMake cache of the PR build that we need to do? |
Hi @ibaned: Each PR build configures and builds from scratch. I see that this MPI build configured properly: https://trilinos-cdash.sandia.gov/build/545791/configure. Please try adding a RETEST... |
@ibaned: Note that I sorted by start time in CDash Test Results for PR# 11175 to find the most recent CDash results. |
@ibaned and @e10harvey, I took off @e10harvey, how do these bad nodes (or at least nodes where the SEMS modules are broken) keep getting into the pool of Jenkins PR build machines? |
It should work now. I've added both of you to the related help desk ticket. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Using Repos:
Pull Request Author: ibaned |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
|
@e10harvey a
|
@ibaned, this is a long known random error on this build. See:
|
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Using Repos:
Pull Request Author: ibaned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for those final improvements!
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
|
@bartlettroscoe I've made the next round of improvements you pointed out. Once the CI on this PR gets figured out, should I open a PR on TriBITS with these changes? |
@ibaned, I can take care of that. (It is just careful usage of And unless something new comes up, the next PR testing iteration is likely to pass and allow the merge. |
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Using Repos:
Pull Request Author: ibaned |
FYI: The matching changes to the TriBITS repo are in: I will merge that PR as soon as this PR merges (fingers crossed on PR testing). |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-7.2.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-17.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-10.0.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-10.1.243
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ bartlettroscoe ]! |
Status Flag 'Pull Request AutoTester' - Pull Request will be Automerged |
Merge on Pull Request# 11175: IS A SUCCESS - Pull Request successfully merged |
@trilinos/tribits
Motivation
This is motivated by issues TriBITSPub/TriBITS#533 and TriBITSPub/TriBITS#534, which are respectively:
Stakeholder Feedback
As both the TriBITS contributor and also the ASC IC PI stakeholder, I would really like the capabilities listed in Motivation.
Testing
That's what this branch is for!