-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scaling for unstructured grids using SCOTCH for domain decomposition #879
Comments
Hi Matthew,
so it seems that your r running out of memory as u have anticipated. Memcheck option should show the issues moreover debug compile flags should be used to check what is happening. The point is that u can inquire the memory usage by the sysadmin since it is know how much memory this job has taken. I just run in the same issue on the DATAMOR and the error was also the same, the reason was not enough memory.
Now, normally memory usage should go down by number of cores, however it seems scotch there is some more memory usage. I will help soon on the issue but need now 1st to get a significant amount of pull request inline.
Cheers and many thanks for this precise and nice inside of this problem.
Aron
Von: Matthew Masarik ***@***.***>
Gesendet: Donnerstag, 2. Februar 2023 23:48
An: NOAA-EMC/WW3 ***@***.***>
Cc: Aron Roland ***@***.***>; Mention ***@***.***>
Betreff: [NOAA-EMC/WW3] Scaling for unstructured grids using SCOTCH for domain decomposition (Issue #879)
Describe the bug
Running WW3 with unstructured grids using the SCOTCH <https://gitlab.inria.fr/scotch/scotch> mesh/hypergraph partitioning library for MPI domain decomposition, scales to ~2K cores, grid size dependent. Above this core count WW3 will fail during model initialization.
This behavior was found during scaling simulations in which allowable resources are ~8K cores. Experiments for two separate mesh's: unst1 = ~0.5M nodes, unst2 = ~1.8M nodes, were conducted on hera. I was unable to run the same experiments on another HPC machine (there are ongoing issues with building WW3/SCOTCH on orion, and SCOTCH is currently not available on WCOSS2, which are the machines I have access to).
Note: ParMetis <https://github.com/KarypisLab/ParMETIS> , which is the partitioning library SCOTCH is replacing, was able to scale out to ~8K cores for each of the grids.
To Reproduce
1. Build SCOTCH
2. Build WW3 with SCOTCH
3. Run executable with cores counts (= MPI tasks) >~2K
Expected behavior
WW3 will error and core dump.
* SCOTCH build instructions for (Intel) hera
# https://gitlab.inria.fr/scotch/scotch.git
cd scotch
module purge
module load cmake/3.20.1
module load intel/2022.1.2
module load impi/2022.1.2
module use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/intel-2022.1.2/modulefiles/stack
module load hpc/1.2.0
module load hpc-intel/2022.1.2
module load hpc-impi/2022.1.2
module load hdf5/1.10.6
module load netcdf/4.7.4
module load gnu/9.2.0
mkdir build && cd build
cmake -DCMAKE_Fortran_COMPILER=ifort \
-DCMAKE_C_COMPILER=icc \
-DCMAKE_INSTALL_PREFIX=<path-to>/install \
-DCMAKE_BUILD_TYPE=Release .. |& tee cmake.out
make VERBOSE=1 |& tee make.out
make install
Screenshots
* hera environment used (job card)
#SBATCH -q batch
#SBATCH -t 08:00:00
#SBATCH --cpus-per-task=1
#SBATCH -n 2400
#SBATCH --exclusive
module purge
module load cmake/3.20.1
module load intel/2022.1.2
module load impi/2022.1.2
module use /scratch1/NCEPDEV/nems/role.epic/hpc-stack/libs/intel-2022.1.2/modulefiles/stack
module load hpc/1.2.0
module load hpc-intel/2022.1.2
module load hpc-impi/2022.1.2
module load jasper/2.0.25
module load zlib/1.2.11
module load libpng/1.6.37
module load hdf5/1.10.6
module load netcdf/4.7.4
module load bacio/2.4.1
module load g2/3.4.5
module load w3emc/2.9.2
module load esmf/8.3.0b09
module load gnu/9.2.0
export SCOTCH_PATH=/scratch1/NCEPDEV/climate/Matthew.Masarik/waves/opt/hpc-stack/scotch/install
ulimit -s unlimited
ulimit -c 0
export KMP_STACKSIZE=2G
export FI_OFI_RXM_BUFFER_SIZE=128000
export FI_OFI_RXM_RX_SIZE=64000
export OMP_NUM_THREADS=1
* log output / error message
<https://user-images.githubusercontent.com/86749872/216459978-52d023fe-6bee-45ab-ba39-04c73af6aee6.png>
* Results from the two grids mentioned above. Both were run separately using SCOTCH, and ParMetis for decomposition.
* unst1
* SCOTCH: scaled to ~1800 cores.
* ParMetis: scaled through the allowable range, 8K cores.
* unst2
* SCOTCH: scaled to ~2200 cores.
* ParMetis: scaled through allowable range, 8K cores.
The plot below shows this behavior.
<https://user-images.githubusercontent.com/86749872/216462966-e6722057-6db1-4eba-89e7-5d29d78e399f.png>
Additional context
This stems from current PR #849 <#849> .
This issue is intended to be a place we can all collect information. @aliabdolali <https://github.com/aliabdolali> @aronroland <https://github.com/aronroland> please share any information you've learned working on this topic.
TODO
* For the unstructured meshes, OMP Threads cannot be used. However, to test if this is memory related, I will be re-running the experiments and adding more than one cpu-per-task, with one thread, to provide more more memory per task.
* Another possible detail. The serial version of Intel compilers (ifort/icc) are passed to cmake. I believe from the variable names this is correct, though I'm not 100% certain that they shouldn't be the MPI wrapper names, mpiifort/mpiicc.
—
Reply to this email directly, view it on GitHub <#879> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB2S43QPBLJYQAKUMZUI6PTWVQ2SXANCNFSM6AAAAAAUPTBXWM> .
You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AB2S43RKI2WCXF4D2U66ZKDWVQ2SXA5CNFSM6AAAAAAUPTBXWOWGG33NNVSW45C7OR4XAZNFJFZXG5LFVJRW63LNMVXHIX3JMTHF3A3XTU.gif> Message ID: ***@***.*** ***@***.***> >
|
Hi Aron, |
Hi Matthew,
I have to thank you did a nice job on that. I discussed also with Ali and I think the 1st step should be to check with your admins and memcheck on that. I am sorry that I tied up with the other stuff I would like to jump but then, actually I found that I totally rely on your guys, since I have 450CPU max. and not more!
Cheers
Aron
Von: Matthew Masarik ***@***.***>
Gesendet: Freitag, 3. Februar 2023 01:43
An: NOAA-EMC/WW3 ***@***.***>
Cc: Aron Roland ***@***.***>; Mention ***@***.***>
Betreff: Re: [NOAA-EMC/WW3] Scaling for unstructured grids using SCOTCH for domain decomposition (Issue #879)
Hi Aron,
That is very helpful information. Thank you for sharing your experience. I have those re-runs in the works so I'll report back when I have the results. That would be amazing news if we can fix this just in the job card. Cheers and thanks again for the insight.
—
Reply to this email directly, view it on GitHub <#879 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB2S43RJPV4KCGSIDP7IUFDWVRIBTANCNFSM6AAAAAAUPTBXWM> .
You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AB2S43X77JDRQI5LJCLMCCLWVRIBTA5CNFSM6AAAAAAUPTBXWOWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSUKB2LQ.gif> Message ID: ***@***.*** ***@***.***> >
|
@aronroland we understand if you have other priorities, but this is a high priority item for us and we'll continue to work on this. It'd be great if you could include us (@MatthewMasarik-NOAA and myself) on your conversations with the SCOTCH developers on this issue. |
Hi Aron, I was able to get some runs in yesterday and will be able to share the results here this afternoon. Please stay tuned. Thanks |
Hi Jessica,
Ok this is no problem. As for the schedule this is also super important for me any my work. The only problem that I cannot run myself on such a large core count. Otherwise with respect to scotch, I will write some mail and introduce you to the scotch team. They are maintaining scotch in quite similar way as we do but not on github. Therefore, we need to get access for you there. Once we are as sure as possible about the nature of the problem. We can engage them via their ticketing system. So let me do that for Matthew and u.
Cheers
Aron
Von: Jessica Meixner ***@***.***>
Gesendet: Freitag, 3. Februar 2023 14:25
An: NOAA-EMC/WW3 ***@***.***>
Cc: Aron Roland ***@***.***>; Mention ***@***.***>
Betreff: Re: [NOAA-EMC/WW3] Scaling for unstructured grids using SCOTCH for domain decomposition (Issue #879)
@aronroland <https://github.com/aronroland> we understand if you have other priorities, but this is a high priority item for us and we'll continue to work on this. It'd be great if you could include us ***@***.*** <https://github.com/MatthewMasarik-NOAA> and myself) on your conversations with the SCOTCH developers on this issue.
—
Reply to this email directly, view it on GitHub <#879 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB2S43XHRDOCJFF5CRCWE7DWVUBKTANCNFSM6AAAAAAUPTBXWM> .
You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AB2S43VMXZ2OBMRETB7UFUDWVUBKTA5CNFSM6AAAAAAUPTBXWOWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSUMRWVA.gif> Message ID: ***@***.*** ***@***.***> >
|
Following @aronroland's initial comments above I performed some more scaling runs where I added cores to tasks to increase the memory (though OMP Thread count stays at 1, for unstruct. so, cores != tasks x threads, for the new cases). The new cases ran are I attempted to run total core counts between 1K -- 8K. All 4-core runs completed. In the 2-core runs the model crashed after ~4K, in the same manner as before. The table gives the parameter details for the highest scaling/best performance for each of the core-per-task cases.
|
Hi Mathew,
hmm so memory seems not be the issue otherwise we should pass more active cores using the 4 core per node, right0 … let’s see what the debug flags will show. But it was a really good try. In the best case also the scotch build should be compiled in debug mode.
Cheers
Aron
Von: Matthew Masarik ***@***.***>
Gesendet: Freitag, 3. Februar 2023 19:57
An: NOAA-EMC/WW3 ***@***.***>
Cc: Aron Roland ***@***.***>; Mention ***@***.***>
Betreff: Re: [NOAA-EMC/WW3] Scaling for unstructured grids using SCOTCH for domain decomposition (Issue #879)
Following @aronroland <https://github.com/aronroland> 's initial comments above I performed some more scaling runs where I added cores to tasks to increase the memory (though OMP Thread count stays at 1, for unstruct. so, cores != tasks x threads, for the new cases). The new cases ran are cores-per-task=<2,4>, these are the only cases that made sense to me -- cores-per-task=1 are what has already been run, and cores-per-task=6 (or above) will eat up too many cores for memory alone, leaving the corresponding task count to low for performance.
I attempted to run total core counts between 1K -- 8K. All 4-core runs completed. In the 2-core runs the model crashed after ~4K, in the same manner as before. The table gives the parameter details for the highest scaling/best performance for each of the core-per-task cases.
cores-per-task
max tot cores
mpi tasks
min runtime/sim day
1
2200
2200
992sec, ~17min
2
4000
2000
933sec, ~16min
4
8000
2000
903sec, ~15min
<https://user-images.githubusercontent.com/86749872/216684278-1cc270ef-a272-416a-959f-486dd1c16fdc.png>
—
Reply to this email directly, view it on GitHub <#879 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB2S43XUBYLP65OH6IVW3FDWVVIIHANCNFSM6AAAAAAUPTBXWM> .
You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AB2S43RNNWOWNL67MRUPXXDWVVIIHA5CNFSM6AAAAAAUPTBXWOWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSUNKMWS.gif> Message ID: ***@***.***>
|
Hi @aronroland, quick clarification: i only performed runs with total core requests up to 8K because this is our rough upper limit of projected resources for GFSv17. Runs past 8K for the cores-per-task=4 are unknown. |
@MatthewMasarik-NOAA can you go past the 8k cores with cores-per-task=4 and see how far you can push this? |
@arunchawla-NOAA yes, I would like be able to try this. The max core request on hera So to push out past 8K cores with cores-per-task=4, I think I would need to submit a request to do a run in hera's |
Thanks Matt A few things
|
@arunchawla-NOAA @MatthewMasarik-NOAA
|
@aliabdolali you mentioned 2-3 weeks ago you'd be looking into checking the decomposition and running with more diagnostic output. Any update on that? @MatthewMasarik-NOAA I'm still not sure what running with the novel queue will tell us that we can't find out other ways, but there's no harm in making that request and doing that run. For the runs that are completing, it'd be interesting to see the memory usage (and perhaps comparing with parametis memory usage) since memory usage still seems to be a theory. I think I lost this, but what happens if we run with 2 threads and more than 2,000mpi tasks? Or does that as well need the novel queue? |
Hi All,
I have some answer from Francois about this things but we need to have a clear track on the actual status. Since I cannot run it my own I would be happy if somebody would do the following:
1. Run will full debug flags
2. Run with simplified debug flags
3. Run 1. And 2. With “memcheck” option
Please provide all error messages that u have and lets do another iteration to have some decent description of the problem for our colleagues. As for the memory issue I am totally with you Mathew but either I must something or I would conclude that u did all needed steps and the we left having a max. core count of about 2k with scotch.
Cheers
Aron
Von: Jessica Meixner ***@***.***>
Gesendet: Donnerstag, 9. Februar 2023 13:53
An: NOAA-EMC/WW3 ***@***.***>
Cc: Aron Roland ***@***.***>; Mention ***@***.***>
Betreff: Re: [NOAA-EMC/WW3] Scaling for unstructured grids using SCOTCH for domain decomposition (Issue #879)
@aliabdolali <https://github.com/aliabdolali> you mentioned 2-3 weeks ago you'd be looking into checking the decomposition and running with more diagnostic output. Any update on that?
@MatthewMasarik-NOAA <https://github.com/MatthewMasarik-NOAA> I'm still not sure what running with the novel queue will tell us that we can't find out other ways, but there's no harm in making that request and doing that run. For the runs that are completing, it'd be interesting to see the memory usage (and perhaps comparing with parametis memory usage) since memory usage still seems to be a theory. I think I lost this, but what happens if we run with 2 threads and more than 2,000mpi tasks? Or does that as well need the novel queue?
—
Reply to this email directly, view it on GitHub <#879 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB2S43SQAUQA2XVT7B7YSF3WWTSBJANCNFSM6AAAAAAUPTBXWM> .
You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AB2S43WJBFI4B7RB7EUBJULWWTSBJA5CNFSM6AAAAAAUPTBXWOWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSU4LBMA.gif> Message ID: ***@***.*** ***@***.***> >
|
Hi Arun,
I am in close contact with Francois, I am totally with u … we need to make sure that it is not on our end. As we have summed up all information’s I will include everybody in the conversation with the SCOTCH Team so we are all on the same line. Otherwise we should as well have this issue posted on the gitlab side of INRIA to keep a sustainable development, I will do that part.
All this was already agreed with Francois, so we are just ready to go as we have all debug etc. … thanks everybody for pushing on that!
Cheers
Aron
Von: Ali.Abdolali ***@***.***>
Gesendet: Donnerstag, 9. Februar 2023 04:49
An: NOAA-EMC/WW3 ***@***.***>
Cc: Aron Roland ***@***.***>; Mention ***@***.***>
Betreff: Re: [NOAA-EMC/WW3] Scaling for unstructured grids using SCOTCH for domain decomposition (Issue #879)
@arunchawla-NOAA <https://github.com/arunchawla-NOAA> @MatthewMasarik-NOAA <https://github.com/MatthewMasarik-NOAA>
Here are my thoughts:
* We need to compile SCOTCH with debug flag, as consistent as possible to Debug flags in WW3 and try it. We can then report to SCOTCH developers the outcomes of our simulation with debug flag.
* I am not sure about the practical difference between ntasks-per-node and cores-per-task, but based on my experience after years of using unstructured ww3 on various scales and different setup size is ntasks-per-node=20, or 30 out of 40 cores/node can help to have more memory/core if memory is an issue. However, I do not think our problem is a memory issue.
* Testing on a different platform like acron would be beneficiary.
—
Reply to this email directly, view it on GitHub <#879 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB2S43VKH34Z7E6WYJG2T3LWWRSMFANCNFSM6AAAAAAUPTBXWM> .
You are receiving this because you were mentioned. <https://github.com/notifications/beacon/AB2S43QQCLYXCLRFADVER5TWWRSMFA5CNFSM6AAAAAAUPTBXWOWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTSU3IU4U.gif> Message ID: ***@***.*** ***@***.***> >
|
@aronroland thanks for pushing it further to SCOTCH developers. |
@arunchawla-NOAA @JessicaMeixner-NOAA @aliabdolali @aronroland, Running in debug mode on orion is underway. Also, getting SCOTCH built on acorn and running a canned case there is also highest priority. Regarding the MPI tasks, I want to clarify I've just referred to 2K as roughly where runs start to die. In the case of 4 cores-per-task, I initially ran 8K cores with 2K tasks, and that was successful. I found hera's limit of 8400 cores when I wanted to see far it could be pushed. I did one 4-core run at the 8400 limit (2100 tasks), and that was successful. Since other runs had made it to 2200 mpi tasks before dying, I was not surprised, so that's what I was referring to previously.
In this case (2 cores), the runs crash. The next increment of resources I had tried with 2 cores was 2,200mpi tasks. This count (and several other increments between 2200-4000 mpi tasks) all crashed. |
@MatthewMasarik-NOAA, @JessicaMeixner-NOAA, is there any news on the run using the debug flags? |
@aronroland I have been digging into it. I can give an update early afternoon. |
@aronroland - no news from me, working on debugging building scotch on orion which is the blocking issue for the PR. Any news from you? |
@JessicaMeixner-NOAA as I said i cannot run on that many cores since I do not have access to more than 448 cores. I am waiting for the debug part so we can see the nature of the problem? Anything I missed that I should do? By the way before I forgot it would be great of scotch and ww3 could be builld with the same debug flags and if we could have a scotch build for debugging. This would be really helpful. Thanks for your hard work on that issue. |
@aronroland do you think it'd be potentially interesting/useful to compare the memory usage between scotch and parmetis even on smaller node counts to see if its vastly different even for smaller number of cores? Still haven't heard anything from @aliabdolali who was going to look into the decomposition and run with extra output, which could give us more information as well. |
We need SCOTCH outputs with debug, so we can ask SCOTCH developers to take a look. I think Aron and I asked for it days ago. I'd appreciate it if you do it at your earliest convenience, then we can continue. |
This would be clearly the next step, but, honestly, before we have the debug/traceback it is fishing in the dark. Otherwise i had some issue with OASIS on memory issues and finally the sysadmin from DATARMOR was so kind to give hints on the memory usage, could u check with your sysadmin if he can tell something about that, when looking on the jobID? As u have the debug I go on with the memory examination ... |
@JessicaMeixner-NOAA I was thinking more on the memory issue, we cannot see this basically without modifying the scotch code. So that without having deep inside in the memory management of SCOTCH it will not be helpful looking at WW3 since for WW3 the memory does not depend on SCOTCH or PARMETIS. |
Hi @aronroland, as far as a list of the flags, they are just as they appear in those two lines from the CMakeList.txt file. It may be that we need to look into the flags we are using, though currently so we don't get sidetracked, I understood from Ali (meeting last Thu) that the run done at ERDC out to 8K cores used the standard WW3 cmake compile, so using those flags listed. |
@MatthewMasarik-NOAA about flags used at ERDC u should discuss with @thesser1. |
@aronroland, from my conversation with @aliabdolali last Thu he stated that for the particular run in question at ERDC, the standard flags had been used. Do you believe different flags were used? @aliabdolali @thesser1 can either of you confirm if the standard WW3 compile options were used for the 8K core run? |
From what I recall, Ty compiled SCOTCH the same way I did initially, and tested WW3 with its release flags. But I'll leave it to him to confirm. |
I have run a scotch test with release flags as well as debug and reldebug
flags that Aron described. All are working on my machine. If you want the
full output from the debug flags, I can provide it.
Ty
…On Mon, Mar 20, 2023 at 1:52 PM Ali.Abdolali ***@***.***> wrote:
From what I recall, Ty compiled SCOTCH the same way I did initially, and
tested WW3 with its release flags. But I'll leave it to him to confirm.
I usually use Aron's flags during development and debugging as WW3
standard flags (including debug) usually do not provide insightful info.
—
Reply to this email directly, view it on GitHub
<#879 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAU2O3ABCTQUK37GTMWPHPLW5CKMDANCNFSM6AAAAAAUPTBXWM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This just slipped. Let me be frank Matt, u must know exactly which flags are used and why they are used. I think that some of the flags that are used like the i4, 32 bit stuff are a very bad choice. Most likely used to make the model b4b, we will now check if it is b4b using other flags. |
@aronroland, I wholeheartedly agree with your sentiment we should know what flags we are using and why. I've tried to do some tests using the Intel debug flags you gave in the post, though I'm running into problems with the compile. Here's what I tried. First I tried using just those flags, and no others by removing the standard + debug flags and replacing them with yours. The compile failed. Next I tried putting back in the standard Intel flags, and replacing the debug flags with your flags. This compile also failed. The Intel compiler doesn't like / seem to recognize some of the flags so I am going to try removing those until the compile succeeds. I'll keep you posted. Have you and Ali been successful compiling with those compile flags on any NOAA machines? |
Matt, "the compile failed" can u please be more specific u can just paste everything here what u got. I am using always this flags, because I know why I am using them and what for I am using them. Those flags have not been developed by me, they are developed by INTEL with clear purpose (see the intel FORTRAN compiler manual). This I know precisely and therefore I am able to interpret this decently. I would very much appreciate if u share any kind of compiler problems bug's in stdout and my warm suggestion is to not going forward as long we have compilation issues. I wish u would be willing sharing this stdout with us, let me thank u in advance for your precious work. Btw NOAA machine is nothing special, it uses intel cpu's and Mellanox or other IB network, so there is no magic there with "NOAA" being particular with respect to the hardware infrastructure. |
Thank you for an excellent meeting today. Here are the main points When Tyler ran the system (ww3+scotch) using impi, he had failures similar to the ones that NOAA has had. The system works for sgimpt. So the following options have been suggested as the path forward -- We shall focus all attention on the debug build options so we can have adequate traceback options. Aron will provide the the debug options we should use for WW3 build (over the standard debug options that we have) -- We should not use the cmake build option for scotch, but use one of the make build options that are available for now for debugging options (Again Aron will direct us to which ones) -- We shall wait for a newer instrumented version of SCOTCH from Francois (It was not clear if we should use the instrumented version of the code Francois had already provided or he would provide a newer version) -- EMC will provide traceback error location using the impi library, in debug mode so that we can know exactly where the problems are occurring. EMC will also test with other mpi libraries it has access to -- Tyler will do the same with his machines. He will test also with the one mpi library that works (sgimpt) -- Aron will provide options to how we can compile the mpi libraries in debug mode to see if that will provide more information -- If these options do not provide an indication on where the problem is occurring we will proceed to more detailed debugging using mpi-barriers and print statements Thank you and please add anything I missed |
I tested @MatthewMasarik-NOAA canned case (v7.0.3, not the version just emailed) with intel 18 on hera:
and it failed with:
|
Here are some output from running with the various SCOTCH_NOAA_DEBUG flags. It's likely these need to be re-run with additional compiler flags turned on for SCOTCH to get additional traceback information. All results below use the WW3 default debug cmake options and build SCOTCH w/cmake in debug mode Intel 18 (unintentionally changed from above test, will re-run with 2022) and Impi. Building with:
Building with:
Building with:
|
@JessicaMeixner-NOAA, @MatthewMasarik-NOAA, as we have yesterday agreed I have provided the how-to build scotch in debug, performance, and further instrumentalization are given in #964 in the discussion section using gnu make. This implies of course that this needs to be applied in combination with #927 from the issue section. |
@aronroland Thanks for pointing this out, I missed the other thread with the build info despite looking for it. Happy to switch to this new build instructions and build flags. In the meantime I have some updates from running with Intel 2021 that I'll share since those runs are in the queue. |
Hi @JessicaMeixner-NOAA, please correct/modify/add/question anything that is not clear since I really like to unify everything in such a way that it is understandable for everybody and this may be difficult for me since I am deep inside of this and maybe I do not explain this in a way that it is broadly understandable. Thanks for your help in advance. Saying this I see that the c-flags in the debugging makefile for impi part could be further expanded but I like to have this in the SCOTCH repo. Therefore I will experiment a bit with this part, adjust with the SCOTCH team and provide a further expanded debug makefile for the "c" language using intel compiler and gnu make for SCOTCH. I think that we can go forward with this but expect more Monday. I was also not sure if the "idea" section is the right place to put but I do not feel that this is like an issue. So feel free to move it anywhere else, where u think it is appropriate. Thanks in advance. |
It was asked by @MatthewMasarik-NOAA, which compiler flags we should use when. Thanks for this question. I have extended #927 in order to answer your important question. Please let me know if this helps. |
Here is output from building SCOTCH with CMAKE,
First run with the following:
Error output:
For an example of which flags are used in compilation here's a line from the scotch make output:
and from WW3: Second run with the following:
Error output:
Third run with the following:
Error output:
Forth run with all 3:
error:
|
Hi all, this is a repost from the email thread in case it was missed. It is a traceback from when I started to test with GNU (meaning no Intel), so is a data point to compare with @JessicaMeixner-NOAA's Intel results just posted. It also shows the assessment of The SCOTCH routine we use is That routine calls It appears to be dying in From the traceback the first intelligible record is for Since the problem seems to be in |
For current efforts, I am working to get the GNU |
Hi @MatthewMasarik-NOAA, sure let's make a meeting with @JessicaMeixner-NOAA and check where we are wtr. to the work schedule, maybe @thesser1 can join us and we can discuss the actual state of work. |
@aronroland I'm on leave Weds-Friday, so I'll set up a time for today. Should we invite Francois too, as we have some additional output that perhaps he can provide feedback on? |
Sure thing, @aronroland. I'm looking forward to discussing today. |
I am not available at 11:30 today, but I can update you that I was able to
set up scotch on my cray computer running with intel compiler and
cray-mpich mpi library. I compiled both scotch and ww3 with debug flags as
described and I put the SCOTCH_NOAA_DEBUG_1 and SCOTCH_NOAA_DEBUG_3 flags
on during the build process. I was able to run the case smoothly to 4200
cores. When I tried to double it to 8000 cores, the run stalled. I hope
to find time today to pull the output of the stalled runs so we can learn
what changed.
…On Tue, Apr 4, 2023 at 9:15 AM Matthew Masarik ***@***.***> wrote:
Hi @MatthewMasarik-NOAA <https://github.com/MatthewMasarik-NOAA>, sure
let's make a meeting with @JessicaMeixner-NOAA
<https://github.com/JessicaMeixner-NOAA> and check where we are wtr. to
the work schedule, maybe @thesser1 <https://github.com/thesser1> can join
us and we can discuss the actual state of work.
Sure thing, @aronroland <https://github.com/aronroland>. I'm looking
forward to discussing today.
—
Reply to this email directly, view it on GitHub
<#879 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAU2O3BOVJG6LIL5V5NXQZ3W7QNGBANCNFSM6AAAAAAUPTBXWM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
SCOTCH_NOAA_DEBUG + SCOTCH GNU makeResults for running the three NOAA debug flags separately in the instrumented 'noaa2' SCOTCH repo. SCOTCH is built using the traditional/GNU
along with each SCOTCH_NOAA_DEBUG_1noaa-1-scotch-make.out.txt
SCOTCH_NOAA_DEBUG_2noaa-2-scotch-make.out.txt
SCOTCH_NOAA_DEBUG_3noaa-3-scotch-make.out.txt
|
I've run the test case building with intel/2022.1.2 and mvapich2/2.3 last week and again today and my job just hangs and I get no output. I believe this is consistent with what @thesser1 reported as well. |
Trying to understand why each of the three tests (
This is with flags
|
Update - Aron's debug flagsReporting output for runs that use:
For these builds the
SCOTCH_NOAA_DEBUG_1debug1.scotch.make.out.txt SCOTCH_NOAA_DEBUG_2debug2.scotch.make.out.txt SCOTCH_NOAA_DEBUG_3debug3.scotch.make.out.txt The same runs were done with
Q: I've been compiling SCOTCH (both Putting it togetherFrom these tracebacks we are clearly having a crash in And from the tracebacks with
with an error message mentioning Line 1436 in 1434: if (matedat.c.multloctmp != NULL) { /* If we allocated the multinode array */
1435: matedat.c.multloctmp =
1436: matedat.c.multloctab = memRealloc (matedat.c.multloctab, matedat.c.multlocnbr * sizeof (DgraphCoarsenMulti)); /* Resize multinode array */
1437: } Is |
This issue was resolved within SCOTCH by release 7.0.4. A scotch/7.0.4 module has previously been added to spack-stack/1.5.0 and installed and tested on RDHPCS machines. scotch/7.0.4 has also been installed on WCOSS2. Final check to confirm scalability was done by @JessicaMeixner-NOAA on cactus (~22 dec 2023) by running WW3 coupled with 6000 PETs for the wave component. |
Describe the bug
Running WW3 with unstructured grids using the SCOTCH mesh/hypergraph partitioning library for MPI domain decomposition, scales to ~2K cores, grid size dependent. Above this core count WW3 will fail during model initialization.
This behavior was found during scaling simulations in which allowable resources are ~8K cores. Experiments for two separate mesh's: unst1 = ~0.5M nodes, unst2 = ~1.8M nodes, were conducted on
hera
. I was unable to run the same experiments on another HPC machine (there are ongoing issues with building WW3/SCOTCH onorion
, and SCOTCH is currently not available onWCOSS2
, which are the machines I have access to).Note: ParMetis, which is the partitioning library SCOTCH is replacing, was able to scale out to ~8K cores for each of the grids.
To Reproduce
Expected behavior
WW3 will error and core dump.
hera
Screenshots
hera
environment used (job card)log output / error message
Results from the two grids mentioned above. Both were run separately using SCOTCH, and ParMetis for decomposition.
The plot below shows this behavior.
Additional context
This stems from current PR #849.
This issue is intended to be a place we can all collect information. @aliabdolali @aronroland please share any information you've learned working on this topic.
TODO
ifort
/icc
) are passed tocmake
. I believe from the variable names this is correct, though I'm not 100% certain that they shouldn't be the MPI wrapper names,mpiifort
/mpiicc
.The text was updated successfully, but these errors were encountered: