-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ww3 scotch #849
ww3 scotch #849
Conversation
model/src/CMakeLists.txt
Outdated
endif() | ||
endif() | ||
|
||
if("PDLIB" IN_LIST switches) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have this in the same "if("PDLIB" IN_LIST switches)" if statement as above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@ukmo-ccbunney @mickaelaccensi @sbrus89 - This PR update requires the SCOTCH library to be built for regression testing, assuming you are running unstructured grid tests, therefore you should probably all review this PR, which would require building SCOTCH. Please let us know if you have any questions, issues or concerns. |
@aliabdolali I believe the automatic checks are failing are due to model/bin/switch_ite_pdlib not having either the METIS or the SCOTCH switch. I believe: switch_ite_pdlib, switch_ugdev2, switch_USACE_1 and switch_USACE_2 should all have SCOTCH added and then maybe @mickaelaccensi can weigh in whether he would like switch_Ifremer2_pdlib to have SCOTCH or METIS. |
Good catch, I was wondering why GitHub actions fail. I added SCOTCH to all with PDLIB. For Ifremer, I added Metis, if Mickael wants, I will change it to SCOTCH. |
still failing ... @MatthewMasarik-NOAA any thought? |
|
@aliabdolali maybe this could be of use: https://github.com/NOAA-EMC/WW3/wiki/Code-Management#automatic-tests-update-for-github-action |
Awesome, I forgot that I added myself to the wiki. thanks for pointing me to it. |
Looks like you guys solved it. I'll check back if these current builds fail |
@aliabdolali looks like the same failure with It also mentions log file |
I've managed to run the |
Hi @ukmo-ccbunney thanks for checking scotch on your hpc, and yes, since decomposition is different,we expect changes in the results. |
we will sort that out as we check the b4b for the explicit scheme ... |
SCOTCH_PATH updates for RDHPCS machines
@thesser1 @sbrus89 @mickaelaccensi @ukmo-ccbunney we hope we can have official reviews from each of you at some point by 3/1. Please let us know if you've run into issues, have concerns, etc. |
@aliabdolali please merge the develop branch into your branch for this PR to fix the CI issues. |
Your fix did not fix the problem for neither of wise nor scotch pr |
@aliabdolali happy to help look into this, but for this PR I see that the tests are still running and I do not yet see a failure, did I miss something on this PR? I'll check the other PR for a failure next. |
Hi all, I wanted to point out I've updated the SCOTCH build description on our FAQ to cover how to turn pthreads off in the cmake step. See the bottom of: https://github.com/NOAA-EMC/WW3/wiki/FAQs-page#how-to-install-scotch |
this is great Matt, could you add a minimum GNU version to the wiki, so users know what would be the requirements? I tried version 9.2 the min, but you might know better than me |
Hi @aliabdolali, @JessicaMeixner-NOAA created an issue (https://gitlab.inria.fr/scotch/scotch/-/issues/21) on the SCOTCH gitlab page a few days ago asking about this. We haven't heard any response yet but will be updating here once we do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have run regression tests ww3_tp2.17, ww3_tp2.6 and ww3_tp2.21 on our dev HPC machine using GNU gcc/gfortran version 12.1.0.
SCOTCH compiled fine and the regression tests ran ok. There are small differences compared to the develop branch, as expected.
I could not run on our operational HPC as the compiler and Bison versions are too old.
I have not tried with the Cray compiler on our dev HPC yet, only GNU.
Super @ukmo-ccbunney |
I'm completing my review and will post momentarily. @thesser1 if you're available to add your review as well please do |
@mickaelaccensi and @sbrus89 --- I hope you are both still testing this. If you have any issues please let us know. @MatthewMasarik-NOAA and I hope that we will be able to merge this at the end of today. If issues arise we can always come back to address them. @thesser1 -- we should get your approval and will wait for it, so if there are issues we should be aware of or if you have a known timeline for when we should get this review, please let us know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review
Pass.
Testing
Pass.
I've attached the matrix.comp
output for hera
and orion
(same as posted on erdc/pull/10).
Note: In both the orion
and hera
output's there were a large number of (0 files differ)
which turned out to be detecting linked files from the regtest input/
directory to the work*/
directories. For this reason and the ongoing mod_def
issue for unstructured, I added some annotation to the Summary output (for orion. it is the same for the same files on hera).
orion
**********************************************************************
********************* non-identical cases ****************************
**********************************************************************
1) (0 files differ): due to file (track_i.outer) linked from input/ to work/
--
mww3_test_02/./work_PR1_c (0 files differ)
mww3_test_02/./work_PR3_UNO_d_c (0 files differ)
mww3_test_02/./work_PR3_UNO_b (0 files differ)
mww3_test_02/./work_PR2_UQ_d (0 files differ)
mww3_test_02/./work_PR3_UQ_d (0 files differ)
mww3_test_02/./work_PR2_UQ_a (0 files differ)
mww3_test_02/./work_PR1_d (0 files differ)
mww3_test_02/./work_PR2_UQ_c (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_c_c (0 files differ)
mww3_test_02/./work_PR2_UNO_MPI_b (0 files differ)
mww3_test_02/./work_PR2_UQ_MPI_b (0 files differ)
mww3_test_02/./work_PR1_a (0 files differ)
mww3_test_02/./work_PR2_UNO_c (0 files differ)
mww3_test_02/./work_PR3_UNO_a (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_a_c (0 files differ)
mww3_test_02/./work_PR1_MPI_c (0 files differ)
mww3_test_02/./work_PR3_UNO_d (0 files differ)
mww3_test_02/./work_PR3_UNO_c (0 files differ)
mww3_test_02/./work_PR3_UQ_a_c (0 files differ)
mww3_test_02/./work_PR2_UNO_d (0 files differ)
mww3_test_02/./work_PR2_UNO_MPI_d (0 files differ)
mww3_test_02/./work_PR3_UQ_c_c (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_c (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_b (0 files differ)
mww3_test_02/./work_PR2_UNO_MPI_a (0 files differ)
mww3_test_02/./work_PR3_UNO_b_c (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_d_c (0 files differ)
mww3_test_02/./work_PR3_UQ_d_c (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_a_c (0 files differ)
mww3_test_02/./work_PR1_MPI_b (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_b (0 files differ)
mww3_test_02/./work_PR2_UQ_MPI_c (0 files differ)
mww3_test_02/./work_PR3_UNO_a_c (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_b_c (0 files differ)
mww3_test_02/./work_PR1_b (0 files differ)
mww3_test_02/./work_PR3_UQ_a (0 files differ)
mww3_test_02/./work_PR3_UNO_c_c (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_d_c (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_d (0 files differ)
mww3_test_02/./work_PR2_UNO_a (0 files differ)
mww3_test_02/./work_PR2_UQ_MPI_d (0 files differ)
mww3_test_02/./work_PR3_UQ_b_c (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_b_c (0 files differ)
mww3_test_02/./work_PR2_UQ_MPI_a (0 files differ)
mww3_test_02/./work_PR2_UNO_b (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_c_c (0 files differ)
mww3_test_02/./work_PR2_UNO_MPI_c (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_c (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_d (0 files differ)
mww3_test_02/./work_PR3_UQ_b (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_a (0 files differ)
mww3_test_02/./work_PR1_MPI_d (0 files differ)
mww3_test_02/./work_PR2_UQ_b (0 files differ)
mww3_test_02/./work_PR1_MPI_a (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_a (0 files differ)
mww3_test_02/./work_PR3_UQ_c (0 files differ)
--
-- 2) usual non-b4b
mww3_test_03/./work_PR3_UQ_MPI_d2 (16 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c (15 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2 (15 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2 (12 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2 (17 files differ)
mww3_test_03/./work_PR2_UNO_MPI_e (1 files differ)
mww3_test_03/./work_PR1_MPI_d2 (11 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c (16 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e_c (1 files differ)
ww3_tp2.10/./work_MPI_OMPH (7 files differ)
ww3_tp2.16/./work_MPI_OMPH (4 files differ)
ww3_ufs1.3/./work_a (3 files differ)
--
3) (0 files differ): due to file (anl.grbtxt) linked from input/ to work/
--
ww3_ta1/./work_UPD2_U_cap (0 files differ)
ww3_ta1/./work_UPD6_O (0 files differ)
ww3_ta1/./work_UPD0F_U (0 files differ)
ww3_ta1/./work_UPD5_U (0 files differ)
ww3_ta1/./work_UPD2_U (0 files differ)
ww3_ta1/./work_UPD2_O (0 files differ)
ww3_ta1/./work_UPD3_U_cap (0 files differ)
ww3_ta1/./work_UPD3_U (0 files differ)
ww3_ta1/./work_UPD0F_O (0 files differ)
ww3_ta1/./work_UPD6_U_cap (0 files differ)
ww3_ta1/./work_UPD6_U (0 files differ)
ww3_ta1/./work_UPD5_O (0 files differ)
ww3_ta1/./work_UPD5_U_cap (0 files differ)
ww3_ta1/./work_UPD3_O (0 files differ)
--
4) (1 file differ): due to 'APPLE partioning' string in file OUTPUT_TOY.txt
--
ww3_tp2.14/./work_OASACM6 (1 files differ)
--
5) (1 file differ): due to unstructured mod_def's
--
ww3_tp2.17/./work_mc1 (1 files differ)
ww3_tp2.17/./work_ma (1 files differ)
ww3_tp2.17/./work_mc (1 files differ)
ww3_tp2.17/./work_mb (1 files differ)
ww3_tp2.17/./work_b (1 files differ)
ww3_tp2.17/./work_c (1 files differ)
ww3_tp2.17/./work_a (1 files differ)
ww3_tp2.17/./work_ma1 (1 files differ)
ww3_tp2.6/./work_ST4 (1 files differ)
ww3_tp2.6/./work_pdlib (1 files differ)
ww3_tp2.6/./work_ST0 (1 files differ)
ww3_ts4/./work_ug_MPI (1 files differ)
--
**********************************************************************
************************ identical cases *****************************
**********************************************************************
hera
**********************************************************************
********************* non-identical cases ****************************
**********************************************************************
mww3_test_02/./work_PR1_c (0 files differ)
mww3_test_02/./work_PR2_UNO_c (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_a (0 files differ)
mww3_test_02/./work_PR1_MPI_d (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_b_c (0 files differ)
mww3_test_02/./work_PR3_UNO_c_c (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_c (0 files differ)
mww3_test_02/./work_PR2_UNO_MPI_d (0 files differ)
mww3_test_02/./work_PR2_UNO_d (0 files differ)
mww3_test_02/./work_PR2_UQ_MPI_a (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_a_c (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_d_c (0 files differ)
mww3_test_02/./work_PR1_d (0 files differ)
mww3_test_02/./work_PR2_UQ_c (0 files differ)
mww3_test_02/./work_PR3_UQ_b (0 files differ)
mww3_test_02/./work_PR3_UQ_a (0 files differ)
mww3_test_02/./work_PR1_a (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_d (0 files differ)
mww3_test_02/./work_PR3_UNO_a_c (0 files differ)
mww3_test_02/./work_PR2_UQ_MPI_b (0 files differ)
mww3_test_02/./work_PR3_UNO_d (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_c_c (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_a_c (0 files differ)
mww3_test_02/./work_PR2_UQ_a (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_b (0 files differ)
mww3_test_02/./work_PR3_UNO_b_c (0 files differ)
mww3_test_02/./work_PR1_b (0 files differ)
mww3_test_02/./work_PR2_UNO_b (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_c_c (0 files differ)
mww3_test_02/./work_PR2_UQ_d (0 files differ)
mww3_test_02/./work_PR3_UQ_c (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_a (0 files differ)
mww3_test_02/./work_PR1_MPI_c (0 files differ)
mww3_test_02/./work_PR3_UNO_b (0 files differ)
mww3_test_02/./work_PR1_MPI_b (0 files differ)
mww3_test_02/./work_PR3_UQ_a_c (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_b_c (0 files differ)
mww3_test_02/./work_PR3_UQ_d_c (0 files differ)
mww3_test_02/./work_PR1_MPI_a (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_d_c (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_d (0 files differ)
mww3_test_02/./work_PR3_UQ_d (0 files differ)
mww3_test_02/./work_PR3_UNO_d_c (0 files differ)
mww3_test_02/./work_PR3_UQ_c_c (0 files differ)
mww3_test_02/./work_PR2_UNO_MPI_b (0 files differ)
mww3_test_02/./work_PR3_UNO_MPI_c (0 files differ)
mww3_test_02/./work_PR2_UQ_b (0 files differ)
mww3_test_02/./work_PR3_UNO_c (0 files differ)
mww3_test_02/./work_PR3_UNO_a (0 files differ)
mww3_test_02/./work_PR2_UQ_MPI_d (0 files differ)
mww3_test_02/./work_PR2_UQ_MPI_c (0 files differ)
mww3_test_02/./work_PR2_UNO_a (0 files differ)
mww3_test_02/./work_PR2_UNO_MPI_c (0 files differ)
mww3_test_02/./work_PR2_UNO_MPI_a (0 files differ)
mww3_test_02/./work_PR3_UQ_b_c (0 files differ)
mww3_test_02/./work_PR3_UQ_MPI_b (0 files differ)
mww3_test_03/./work_PR1_MPI_e (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e_c (1 files differ)
mww3_test_03/./work_PR2_UQ_MPI_e (1 files differ)
mww3_test_03/./work_PR2_UNO_MPI_e (1 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2 (15 files differ)
mww3_test_03/./work_PR1_MPI_d2 (6 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c (15 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c (16 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2 (15 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2 (15 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e_c (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2 (15 files differ)
ww3_ta1/./work_UPD0F_O (0 files differ)
ww3_ta1/./work_UPD0F_U (0 files differ)
ww3_ta1/./work_UPD2_U_cap (0 files differ)
ww3_ta1/./work_UPD3_U (0 files differ)
ww3_ta1/./work_UPD5_U_cap (0 files differ)
ww3_ta1/./work_UPD6_O (0 files differ)
ww3_ta1/./work_UPD2_U (0 files differ)
ww3_ta1/./work_UPD5_O (0 files differ)
ww3_ta1/./work_UPD5_U (0 files differ)
ww3_ta1/./work_UPD3_U_cap (0 files differ)
ww3_ta1/./work_UPD3_O (0 files differ)
ww3_ta1/./work_UPD6_U (0 files differ)
ww3_ta1/./work_UPD2_O (0 files differ)
ww3_ta1/./work_UPD6_U_cap (0 files differ)
ww3_tp2.10/./work_MPI_OMPH (6 files differ)
ww3_tp2.16/./work_MPI_OMPH (4 files differ)
ww3_tp2.17/./work_ma (1 files differ)
ww3_tp2.17/./work_a (1 files differ)
ww3_tp2.17/./work_mc1 (1 files differ)
ww3_tp2.17/./work_mb (1 files differ)
ww3_tp2.17/./work_mc (1 files differ)
ww3_tp2.17/./work_ma1 (1 files differ)
ww3_tp2.17/./work_c (1 files differ)
ww3_tp2.17/./work_b (1 files differ)
ww3_ufs1.3/./work_a (3 files differ)
**********************************************************************
************************ identical cases *****************************
**********************************************************************
Awesome, thanks @thesser1 ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to get this working on our systems.
Awesome. Thanks for your review @sbrus89. I'm curious if you compiled with gnu or other (intel)? |
@MatthewMasarik-NOAA, I used Gnu 10.2.0 |
Great, thanks for confirming. And thanks for your work helping test this @sbrus89! |
It is great to have confirmation with different compilers. |
Thank you @aliabdolali and @aronroland for adding the SCOTCH capability. Thank you to @AlexanderRichert-NOAA @aerorahul @MatthewMasarik-NOAA and the many others who helped get us through cmake, orion and other issues and get this PR past the finish line! Thank you to @ukmo-ccbunney @sbrus89 @thesser1 for your reviews! This is a huge accomplishment and we look forward to continuing to work on the outstanding issues resolving the scaling and mpi b4b (which are likely the cause of diffs between scotch and metis). |
Yay, finally, this PR hit a record !!!! |
Pull Request Summary
Add SCOTCH option for geographical domain decomposition parallelization for unstr WW3.
Description
SCOTCH library has been added as an option for the domain decomposition for unstr WW3.
When PDLIB is used, one dependent switch (METIS or SCOTCH) is required.
GNU and Cmake builds are extended to support PDLIB/SCOTCH and PDLIB/METIS.
The default combination in the existing tests with PDLIB is now with scotch (ww3_tp2.17, ww3_tp2.6 and ww3_tp2.21). One test is added to ww3_tp2.21 to check METIS.
authors: @aliabdolali and @aronroland
Please also include the following information:
Issue(s) addressed
Commit Message
Add SCOTCH library for geographical domain decomposition
Check list
Testing
With intel and gfortran compilers with gnu and cmake build
Add SCOTCH to the default input/switch_PDLIB for ww3_tp2.17, ww3_tp2.6 and ww3_tp2.21.
One test is added to ww3_tp2.21/input/switch_PDLIB_METIS to check PARMETIS.
HERA with intel compiler and hpc-stack
ww3_tp2.17, ww3_tp2.6 and ww3_tp2.21