Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update MPI, HDF5, and Netcdf TriBITS find modules (#533, #534) #535

Merged
merged 10 commits into from
Oct 28, 2022

Conversation

bartlettroscoe
Copy link
Member

@bartlettroscoe bartlettroscoe commented Oct 26, 2022

Description

These commits address #533 and #534

These commits were pulled off of the branch from:

(which also matches the patches in sandialabs/seacas#336).

This seems to maintain near perfect backwards compatibility.

Tasks

ibaned and others added 9 commits October 26, 2022 15:11
when this CMake cache variable is set to a true
value, the TriBITS Netcdf TPL code will only look
for externally installed CMake config files for
CMake packages called netCDF and possibly hdf5.
Such files are installed by the latest versions
of NetCDF and HDF5.
This effectively allows loading NetCDF and its
HDF5 dependency using the most modern CMake
approach.

This is needed to resolve TriBITSPub#533
This reverts commit e49070d29fdd8c1d525bb22c52d7b2745271d7e2.

This causes configure failures.
This makes Netcdf depend on HDF5 using modern CMake targets.

ToDo: Finish this commit message.
1. set the minimum version to the one I'm testing
   with
2. use HDF5_EXPORT_LIBRARIES since the targets
   defined are hard to predict
1. don't expect HDF5 target to be set.
   in the general case, Netcdf may not use HDF5,
   but also HDF5 has no predictable catch-all target
2. TPL_Netcdf_NOT_FOUND still has to be manually set
   to FALSE
@bartlettroscoe
Copy link
Member Author

Okay, given above, hopefully these new FindTPL[HDF5|Netcdf].cmake modules will work with newer releases of HDF5 and NetCDF as well.

I will merge this after the merge of:

We can always tweak this later if needed.

@bartlettroscoe
Copy link
Member Author

@bartlettroscoe
Copy link
Member Author

Hello @KyleFromKitware, could you please take a look at this PR. This is an example of where this updated TriBITS external package/TPL system provides the tools to fix up these external packages that are not quite following proper usage of CMake. If you focus on the new code added to these modules, you will see it is pretty minimal (and is mostly print statements).

And, actually, I just realized that I did not add a test for TriBITS core change 7fd57e5 (#535) to protect it going forward. Since there is no huge rush to merge this PR, I will do that when I get back to MD.

tribits/common_tpls/FindTPLNetcdf.cmake Outdated Show resolved Hide resolved
tribits/common_tpls/FindTPLNetcdf.cmake Outdated Show resolved Hide resolved
tribits/common_tpls/FindTPLHDF5.cmake Show resolved Hide resolved
tribits/common_tpls/FindTPLNetcdf.cmake Show resolved Hide resolved
endif()

endif()

set(HDF5_FOUND_MODERN_CONFIG_FILE ${INTERNAL_IS_MODERN} CACHE BOOL "True if HDF5 was found by the modern method" FORCE)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${INTERNAL_IS_MODERN} should be quoted to ensure that set() receives a value... though I see from above that it will always have a value, so perhaps it doesn't matter. Still, quoting it makes this explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point I never considered. If INTERNAL_IS_MODERN is empty (or undefined), this statement just produces a regular list variable with the value:

"CACHE;BOOL;True if HDF5 was found by the modern method;FORCE"

Is that correct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it turns out that if it's empty or undefined, it will simply set a cache variable with an empty value. However, readers won't know this just from glancing at the code, and for that reason I still recommend quoting it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that for regular command calls, not quoting an arg can help catch errors where an input args is empty. For example, if you have a function:

function(my_func  arg1  arg2  arg3)
   ...
endfunction()

and calling this with:

my_func(${var1} ${var2} ${var3})

will result in a CMake error if any of these 3 vars are empty.

But this logic does not apply to set() statements.

We really need a linter for CMake code to catch issues like missing quotes for ${val} in set(<var> ${val} CACHE ...) calls.

Copy link
Member Author

@bartlettroscoe bartlettroscoe Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it turns out that if it's empty or undefined, it will simply set a cache variable with an empty value. However, readers won't know this just from glancing at the code, and for that reason I still recommend quoting it.

@KyleFromKitware, then what is the benefit of quoting the ${val} argument in set(<var> ${val} CACHE ...) if it does not impact behavior? In my example above, quoting the argument can change behavior.

Copy link
Collaborator

@KyleFromKitware KyleFromKitware Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then what is the benefit of quoting the ${val} argument in set(<var> ${val} CACHE ...) if it does not impact behavior?

It assures the reader that there won't be a problem. I had to test the behavior of set(... CACHE) to check that it would work without quoting, whereas I knew instantly just by looking at it that it would work with quoting. I believe it's better to be explicit than implicit. Just a thought.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point. There really needs to be a CMake linter to catch issues like this automatically. We should not expend too much time in PR reviews dealing with low-level language issues like this. That is why they make linters in the first place.

In the future, we will need your help reviewing these types of changes when they are being created and tested in the Trilinos github repo. Once we being this back to TriBITS (as the official repo for these common TPLs) we can't test them easily.

This file gets included in a TriBITS project which must have CMake 3.17+.
@bartlettroscoe
Copy link
Member Author

I am looking over how to test 7fd57e5 (#535) and it will require quite a lot of work and a lot of code. That code in that file has not changed for years and is unlikely to ever change. (Even the large refactoring in #299 did not touch that code at all. Therefore, I have determined that the cost of adding a test for that change is larger than the value it will have. (This is not a decision I take likely.)

Copy link
Collaborator

@rabartlett1972 rabartlett1972 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the other changes are very hard to test as noted, I will approve this PR to be merged.

In the future, we need to be reviewing these changes in the Trilinos and SEACAS PRs where they can be tested.

@bartlettroscoe bartlettroscoe dismissed KyleFromKitware’s stale review October 28, 2022 16:23

Addressed one change but the other suggested changes are quite hard to fully test and these are not defects

@bartlettroscoe bartlettroscoe merged commit 9fd56d2 into TriBITSPub:master Oct 28, 2022
if (TPL_ENABLE_MPI)
set(REQUIRED_LIBS_NAMES ${REQUIRED_LIBS_NAMES} pnetcdf)
endif()
if (Netcdf_ALLOW_MODERN AND HDF5_FOUND_MODERN_CONFIG_FILE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After seeing how to use this in SEACAS --
This does not seem to work if HDF5 is added as a dependency only because of netCDF since at this point, I don't think that we have yet found HDF5, so HDF5_FOUND_MODERN_CONFIG_FILE will not be set?

At least for SEACAS, I think that the Netcdf_ALLOW_MODERN should drive this and determine whether to find HDF5 the modern/legacy way? Does that make sense?

Or, do I reorder things such that CGNS (which depends on HDF5) comes first and then netCDF?
@bartlettroscoe @ibaned

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, do I reorder things such that CGNS (which depends on HDF5) comes first and then netCDF?

@gsjaardema, you specify the dependencies as: HDF5 <-- CGNS <-- Netcdf

See:

It looks like GGNS already has a required dependency on HDF5 as per:

tribits_extpkg_define_dependencies( CGNS
DEPENDENCIES HDF5)

But I had no idea that Netcdf depended on CGNS?

Also not that, currently, TriBITS does not automatically enable upstream TPL dependencies for an enabled TPL. So the configure script will need to manually enable all of the TPLs it wants. (See NOTES at bottom of slide 16 "TriBITS External Packages/TPLs Dependencies (New!)" in the presentation TriBITS Modernization Update I gave last Thursday, Oct 27, 2022.)

@gsjaardema
Copy link
Contributor

NetCDF doesn't depend on CGNS. They both depend on HDF5, so I thought that maybe configuring CGNS first would then set the MODERN cmake for HDF5 which would then allow the if (Netcdf_ALLOW_MODERN AND HDF5_FOUND_MODERN_CONFIG_FILE) to work...

If netCDF is configured before CGNS, then that if won't be triggered because it is netCDF that triggers HDF5 if I am understanding things correctly (probably not)

@bartlettroscoe
Copy link
Member Author

If netCDF is configured before CGNS, then that if won't be triggered because it is netCDF that triggers HDF5 if I am understanding things correctly (probably not)

@gsjaardema, the HDF5 TPL will always be configured before Netcdf TPL, because the TPLs are processed in the order they are listed in the TPLsList.cmake file which is the case with SEACAS as shown at:

https://github.com/sandialabs/seacas/blob/a9634da294d7d4bc64145fa33969de6be10d31b6/TPLsList.cmake#L60-L62

@gsjaardema
Copy link
Contributor

This is a portion of the output of my CMake configuration:

Processing enabled external package/TPL: Netcdf (enabled explicitly, disable with -DTPL_ENABLE_Netcdf=OFF)
-- Using find_package(Netcdf ...) ...
--      NetCDF_ROOT is /Users/gdsjaar/src/seacas
--      netCDF_CONFIG is /Users/gdsjaar/src/seacas/lib/cmake/netCDF/netCDFConfig.cmake
-- NetCDF requires HDF5
-- NetCDF depends on HDF5
-- Updating NetCDF_LIBRARIES and NetCDF_INCLUDE_DIRS
-- Found HDF5 CMake configuration file ( directory /Users/gdsjaar/src/seacas/cmake/hdf5-config.cmake )
-- HDF5 Version: 1.10.9
--      HDF5_INCLUDE_DIRS      =/Users/gdsjaar/src/seacas/include
--      HDF5_LIBRARY_TARGETS   =hdf5_hl;hdf5
--      HDF5_LIBRARIES         =/Users/gdsjaar/src/seacas/lib/libhdf5_hl.dylib;/Users/gdsjaar/src/seacas/lib/libhdf5.dylib
--      HDF5_LIBRARIES_EXPORT  =/Users/gdsjaar/src/seacas/lib/libhdf5_hl.dylib;/Users/gdsjaar/src/seacas/lib/libhdf5.dylib
--      HDF5_LINK_LIBRARIES    =
--      HDF5_IS_PARALLEL       =NO
-- Found the following HDF5 component libraries
--      hdf5
--      hdf5_hl
--      HDF5 Components not found: static;shared;CXX;Fortran;CXX_HL;Fortran_HL;Java;Tools
--      HDF5_TOOLS_FOUND: h52gif;h5copy;h5debug;h5diff;h5dump;h5import;h5jam;h5ls;h5mkgrp;h5stat
-- Found HDF5: /Users/gdsjaar/src/seacas/include
-- NetCDF does not require PNetCDF
-- Found NetCDF: /Users/gdsjaar/src/seacas/lib/libnetcdf.dylib;/Users/gdsjaar/src/seacas/lib/libhdf5_hl.dylib;/Users/gdsjaar/src/seacas/lib/libhdf5.dylib
-- NetCDF Version: 4.9.0
--      NetCDF_NEEDS_HDF5        = yes
--      NetCDF_NEEDS_PNetCDF     = no
--      NetCDF_PARALLEL          = False
--      NetCDF_INCLUDE_DIRS      = /Users/gdsjaar/src/seacas/include;/Users/gdsjaar/src/seacas/include
--      NetCDF_LIBRARIES         = /Users/gdsjaar/src/seacas/lib/libnetcdf.dylib;/Users/gdsjaar/src/seacas/lib/libhdf5_hl.dylib;/Users/gdsjaar/src/seacas/lib/libhdf5.dylib
--      NetCDF_BINARIES          = ncdump;ncgen;nccopy
-- Netcdf_LIBRARY_NAMES='netcdf'
-- TPL_Netcdf_LIBRARIES='/Users/gdsjaar/src/seacas/lib/libnetcdf.dylib;/Users/gdsjaar/src/seacas/lib/libhdf5_hl.dylib;/Users/gdsjaar/src/seacas/lib/libhdf5.dylib'
-- TPL_Netcdf_INCLUDE_DIRS='/Users/gdsjaar/src/seacas/include;/Users/gdsjaar/src/seacas/include'
-- TPL_Netcdf_PARALLEL is False

Doesn't that show that Netcdf is being found and then since netcdf depends on HDF5, then HDF5 is found/processed?

Note that HDF5 is an optional dependency wherever it is mentioned in the SEACAS Dependencies.cmake:

packages/seacas/libraries/exodus/cmake/Dependencies.cmake:  LIB_OPTIONAL_TPLS Pthread HDF5 Pnetcdf MPI

Thanks for trying to explain.

@bartlettroscoe
Copy link
Member Author

@gsjaardema, can you please attach the full cmake STDOUT+STDERR?

What are the dependencies of Pnetcdf? Do we need to add FindTPL<tplName>Dependencies.cmake entries for those as well?

endif()
if (Netcdf_ALLOW_MODERN)

set(minimum_modern_HDF5_version 1.13.2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibaned @bartlettroscoe
Sorry for not noticing this prior to merging.
What is needed in the HDF5-1.10.X and HDF5-1.12.X versions to make them usable for this. Both of those versions are still in development, so it would be good to fix their CMake configurations if possible.

I can either contact the HDF5 group with the suggested changes, or one of you can also. Note that we have priority support with HDF5, so they will listen and most likely make useful changes that we suggest.

@gsjaardema
Copy link
Contributor

What are the dependencies of Pnetcdf? Do we need to add FindTPL<tplName>Dependencies.cmake entries for those as well?

PnetCDF only depends on MPI and netCDF optionally depends on PnetCDF in a parallel build. SEACAS only accesses / depends on PnetCDF if netCDF was configured in parallel and with PnetCDF capability enabled. There isn't any direct access from SEACAS that bypassed netCDF and hits PnetCDF.

@gsjaardema
Copy link
Contributor

@gsjaardema, can you please attach the full cmake STDOUT+STDERR?
seacas-config-debug.txt

@bartlettroscoe
Copy link
Member Author

PnetCDF only depends on MPI and netCDF optionally depends on PnetCDF in a parallel build. SEACAS only accesses / depends on PnetCDF if netCDF was configured in parallel and with PnetCDF capability enabled. There isn't any direct access from SEACAS that bypassed netCDF and hits PnetCDF.

Then this dependency between the TriBITS Pnetcdf and Netcdf TPLs should be expressed by updating the file https://github.com/sandialabs/seacas/blob/master/cmake/tribits/common_tpls/FindTPLNetcdfDependencies.cmake to be:

tribits_extpkg_define_dependencies( Netcdf
  DEPENDENCIES  Pnetcdf HDF5)

In case we do turn on TPL dependencies by default, this might get changed to:

set(Netcdf_Pnetcdf_dep "")
if (TPL_ENABLE_MPI)
  set(Netcdf_Pnetcdf_dep  Pnetcdf)
endif()
tribits_extpkg_define_dependencies( Netcdf
  DEPENDENCIES  ${Netcdf_Pnetcdf_dep} HDF5)

That would mean that if the user set TPL_ENABLE_Netcdf=ON and TPL_ENABLE_MPI=ON on input, then TriBITS would try to automatically enable Pnetcdf (once we turn on indirect upstream TPL dependency enables).

How common is it to enable MPI but not build Netcdf with Pnetcdf? That is, in the future, do we want the user setting TPL_ENABLE_Netcdf=ON and TPL_ENABLE_MPI=ON to automatically enable Pnetcdf? Note that the user will always be able to set TPL_ENABLE_Pnetcdf=OFF and the Pnetcdf TPL will not get enabled. Is that logical behavior? What is the least surprising behavior?

@bartlettroscoe
Copy link
Member Author

@gsjaardema, the problem is that the HDF5 TPL is not being enabled as can be see in the output seacas-config-debug.txt showing:

Final set of enabled external packages/TPLs:  Netcdf CGNS METIS Matio X11 DLlib fmt 7

Final set of non-enabled external packages/TPLs:  GTest Zlib Pthread MPI HDF5 Pnetcdf DataWarp ParMETIS Pamgen CUDA Kokkos Faodel Cereal ADIOS2 Catalyst2 15

Someone needs to set TPL_ENABLE_HDF5=ON.

Currently, TriBITS is not automatically enabling the HDF5 TPL because of the Netcdf dependency on it (because it would break backward compatibility) but in the future it could.

Can you add -D TPL_ENABLE_HDF5:BOOL=ON to the configure scripts? Or should we bite the bullet and have TriBITS automatically enable upstream TPL dependencies as described above? (The latter will break some configure scripts but that is easy to fix by just doing a hard disable with -D TPL_ENABLE_<tpl-that-should-be-disabled>=OFF.)

@gsjaardema
Copy link
Contributor

... deleted...
Then this dependency between the TriBITS Pnetcdf and Netcdf TPLs should be expressed by updating the file https://github.com/sandialabs/seacas/blob/master/cmake/tribits/common_tpls/FindTPLNetcdfDependencies.cmake to be:

tribits_extpkg_define_dependencies( Netcdf
  DEPENDENCIES  Pnetcdf HDF5)

I still confused as to why the netCDF configuration isn't the place where this information is handled. That is the current behavior and it seems to be working in most situations...

netCDF has a few optional dependencies depending on how it is configured:

  • hdf5
  • pnetcdf
  • dap, curl, nczarr,
  • compression: szip, bz2, zlib, zstd, blosc
  • libxml2
  • mpi

We will also have that issue with HDF5 soon if we start adding some of the advanced HDF5 capabilties (VOLs will add optional dependencies); multiple compression library options.

As I see it, only the netCDF configuration files can fully know what optional dependencies are present in the library...

@bartlettroscoe
Copy link
Member Author

As I see it, only the netCDF configuration files can fully know what optional dependencies are present in the library...

The netCDFConfig.cmake file is broken. It does not contain a call to find_dependency(HDF5). @ibaned knows more about that.

TriBITS allows us to fix issues with these external packages when they are not doing the correct behavior.

@gsjaardema
Copy link
Contributor

The netCDFConfig.cmake file is broken. It does not contain a call to find_dependency(HDF5). @ibaned knows more about that.

TriBITS allows us to fix issues with these external packages when they are not doing the correct behavior.

Ok, that was the missing piece for me --

@ibaned
Copy link
Contributor

ibaned commented Nov 1, 2022

Hi! I'm coming in late to the conversation, I gather essentially that this PR broke something? Could we maybe open an issue on the SEACAS or TriBITS repositories just to track what broke and I can get a better understanding?

@gsjaardema
Copy link
Contributor

@ibaned Nothing broke that I am aware of, I am just not sure it is working the way it should -- I am unable to get the MODERN config to work due to the way that the HDF5 dependency is currently added via netCDF and not as a direct SEACAS dependency.

@ibaned
Copy link
Contributor

ibaned commented Nov 1, 2022

Ah I see! If it helps, here is the CMake configuration that I'm using:

  "-DSeacas_ENABLE_Fortran=OFF"
  "-DSeacas_ENABLE_SEACASExodus=ON"
  "-DSEACASExodus_ENABLE_MPI=ON"
  "-DSEACASExodus_ENABLE_SHARED=OFF"
  "-DSeacas_ENABLE_SEACASAprepro_lib=ON"
  "-DTPL_ENABLE_HDF5=ON"
  "-DNetcdf_ALLOW_MODERN=ON"
  "-DnetCDF_ROOT=${CAPP_INSTALL_ROOT}/netcdf-c"
  "-DHDF5_ROOT=${CAPP_INSTALL_ROOT}/hdf5"
  ${SEACAS_MPI_OPTIONS}
  "-DSeacas_VERBOSE_CONFIGURE=OFF"
  "-DTRIBITS_TPL_FIND_INCLUDE_DIRS_AND_LIBRARIES_VERBOSE=OFF"
  "-DBUILD_SHARED_LIBS=OFF"

@gsjaardema
Copy link
Contributor

It is probably the "-DTPL_ENABLE_HDF5=ON" that does it.

Also, do you know what HDF5-1.10.X is missing to allow it to use the MODERN method...

@ibaned
Copy link
Contributor

ibaned commented Nov 1, 2022

Maybe nothing! I was being overly conservative in listing just the versions installed on my system.

@bartlettroscoe
Copy link
Member Author

@gsjaardema, can we set up a short MS Teams meeting so I can explain to you what is happening and how TriBITS is fixing this?

@bartlettroscoe
Copy link
Member Author

TriBITS allows us to fix issues with these external packages when they are not doing the correct behavior.

Ok, that was the missing piece for me --

@gsjaardema, the key is that when setting TPL_ENABLE_HDF5=ON, TriBITS generates a file NetcdfConfig.cmake in the build and install trees that basically does:

set(HDF5_DIR ...)
find_dependency(HDF5)
set(netCDF_DIR ...)
find_dependency(netCDF)
add_library(Netcdf::all_libs IMPORTED INTERFACE)
...

Look at the file:

<buildDir>/external_packages/Netcdf/NetcdfConfig.cmake

and you will see this.

And it is possible to extend this TriBITS support to automatically fix other situations as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants