Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add machine config file for RZAnsel #377

Merged
merged 41 commits into from
Feb 3, 2021

Conversation

JoshuaSBrown
Copy link
Collaborator

@JoshuaSBrown JoshuaSBrown commented Nov 23, 2020

PR Summary

This adds a Machine Config file for RZAnsel. This should make it easier for users to get up and running with a working build. Using the ideas that @pgrete had in the Summit config file three variants are allowed to be built, "cuda", "mpi", "cuda+mpi".

Note that I have not removed the old documentation for building on RZAnasel because this was erroneously altered from the Summit documentation. As per my conversation with @pgrete he was going to fix it, so I did not want to touch it.

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md

@JoshuaSBrown
Copy link
Collaborator Author

Waiting for this https://re-git.lanl.gov/eap-oss/parthenon-project/-/merge_requests/2 to be merged first.

@JoshuaSBrown
Copy link
Collaborator Author

JoshuaSBrown commented Jan 4, 2021

This is now ready for review with the exception of these lines:

#set(RZANSEL_PROJECT_PREFIX /usr/gapps/parthenon_shared/parthenon-project
#    CACHE STRING "Path to parthenon-project checkout")

set(RZANSEL_PROJECT_PREFIX /g/g15/brown338/Software/parthenon-project
    CACHE STRING "Path to parthenon-project checkout")

I will fix this once the parthenon-project repo is updated and it is updated in RZAnsel.

@JoshuaSBrown JoshuaSBrown changed the title WIP Add machine config file for RZAnsel Add machine config file for RZAnsel Jan 4, 2021
@JoshuaSBrown
Copy link
Collaborator Author

Ok, this is good to go.

Joshua S Brown and others added 2 commits January 19, 2021 16:18
Co-authored-by: Philipp Grete <gretephi@msu.edu>
Co-authored-by: Philipp Grete <gretephi@msu.edu>
@JoshuaSBrown
Copy link
Collaborator Author

JoshuaSBrown commented Jan 20, 2021

@pgrete I could use some help with this, either there is some logic in parthenon that needs to be corrected or the tests simply are not configured to run with 4 gpus and 4 mpi procs:

This first scenario should be the one that works:

I can verify that nvidia-smi indicates 4 tasks are launched.

    Start 34: regression_mpi_test:advection_performance
1/6 Test #34: regression_mpi_test:advection_performance ...***Failed   68.00 sec


test_dir=['/g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance']
output_dir='/g/g15/brown338/Software/parthenon/build/tst/regression/outputs/advection_performance_mpi'
driver=['/g/g15/brown338/Software/parthenon/build/example/advection/advection-example']
driver_input=['/g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance/parthinput.advection_performance']
kokkos_args=['--kokkos-num-devices=1 --kokkos-threads=1']
num_steps=5
mpirun=['/usr/tcetmp/bin/jsrun']
mpirun_opts=['-a', '4', "-c 1 -n 1 -g 1 -r 1 -d packed --smpiargs='-gpu'"]
coverage=False
*****************************************************************
Beginning Python regression testing script
*****************************************************************

Initializing Test Case
Using:
driver at:       /g/g15/brown338/Software/parthenon/build/example/advection/advection-example
driver input at: /g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance/parthinput.advection_performance
test folder:     /g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance
output sent to:  /g/g15/brown338/Software/parthenon/build/tst/regression/outputs/advection_performance_mpi

Make output folder in test if does not exist
*****************************************************************
Preparing Test Case Step 1
*****************************************************************

*****************************************************************
Running Driver
*****************************************************************

Command to execute driver
/usr/tcetmp/bin/jsrun -a 4 -c 1 -n 1 -g 1 -r 1 -d packed --smpiargs='-gpu' /g/g15/brown338/Software/parthenon/build/example/advection/advection-example -i /g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance/parthinput.advection_performance parthenon/mesh/nx1=256 parthenon/meshblock/nx1=256 parthenon/mesh/nx2=256 parthenon/meshblock/nx2=256 parthenon/mesh/nx3=256 parthenon/meshblock/nx3=256 --kokkos-num-devices=1 --kokkos-threads=1

*****************************************************************
Subprocess error message
*****************************************************************

b'### PARTHENON ERROR
  Message:     ### FATAL ERROR in Mesh constructor
Too few mesh blocks: nbtotal (1) < nranks (4)

  File:        ../src/mesh/mesh.cpp
  Line number: 450
### PARTHENON ERROR
  Message:     ### FATAL ERROR in Mesh constructor
Too few mesh blocks: nbtotal (1) < nranks (4)

  File:        ../src/mesh/mesh.cpp
  Line number: 450
### PARTHENON ERROR
  Message:     ### FATAL ERROR in Mesh constructor
Too few mesh blocks: nbtotal (1) < nranks (4)

  File:        ../src/mesh/mesh.cpp
  Line number: 450
### PARTHENON ERROR
  Message:     ### FATAL ERROR in Mesh constructor
Too few mesh blocks: nbtotal (1) < nranks (4)

  File:        ../src/mesh/mesh.cpp
  Line number: 450
'

    Start 34: regression_mpi_test:advection_performance
1/6 Test #34: regression_mpi_test:advection_performance ...***Failed   66.39 sec


test_dir=['/g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance']
output_dir='/g/g15/brown338/Software/parthenon/build/tst/regression/outputs/advection_performance_mpi'
driver=['/g/g15/brown338/Software/parthenon/build/example/advection/advection-example']
driver_input=['/g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance/parthinput.advection_performance']
kokkos_args=['--kokkos-num-devices=4 --kokkos-threads=1']
num_steps=5
mpirun=['/usr/tcetmp/bin/jsrun']
mpirun_opts=['-a', '4', "-c 1 -n 1 -g 1 -r 1 -d packed --smpiargs='-gpu'"]
coverage=False
*****************************************************************
Beginning Python regression testing script
*****************************************************************

Initializing Test Case
Using:
driver at:       /g/g15/brown338/Software/parthenon/build/example/advection/advection-example
driver input at: /g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance/parthinput.advection_performance
test folder:     /g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance
output sent to:  /g/g15/brown338/Software/parthenon/build/tst/regression/outputs/advection_performance_mpi

Make output folder in test if does not exist
*****************************************************************
Preparing Test Case Step 1
*****************************************************************

*****************************************************************
Running Driver
*****************************************************************

Command to execute driver
/usr/tcetmp/bin/jsrun -a 4 -c 1 -n 1 -g 1 -r 1 -d packed --smpiargs='-gpu' /g/g15/brown338/Software/parthenon/build/example/advection/advection-example -i /g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance/parthinput.advection_performance parthenon/mesh/nx1=256 parthenon/meshblock/nx1=256 parthenon/mesh/nx2=256 parthenon/meshblock/nx2=256 parthenon/mesh/nx3=256 parthenon/meshblock/nx3=256 --kokkos-num-devices=4 --kokkos-threads=1

*****************************************************************
Subprocess error message
*****************************************************************

b''

*****************************************************************
Error detected while running subprocess command
*****************************************************************

Traceback (most recent call last):
  File "/g/g15/brown338/Software/parthenon/tst/regression/utils/test_case.py", line 237, in Run
    proc = subprocess.run(run_command, check=True, stdout=PIPE, stderr=PIPE)
  File "/usr/gapps/parthenon_shared/parthenon-project/views/rzansel/ppc64le/gcc8/2021-01-04/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/tcetmp/bin/jsrun', '-a', '4', '-c', '1', '-n', '1', '-g', '1', '-r', '1', '-d', 'packed', "--smpiargs='-gpu'", '/g/g15/brown338/Software/parthenon/build/example/advection/advection-example', '-i', '/g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance/parthinput.advection_performance', 'parthenon/mesh/nx1=256', 'parthenon/meshblock/nx1=256', 'parthenon/mesh/nx2=256', 'parthenon/meshblock/nx2=256', 'parthenon/mesh/nx3=256', 'parthenon/meshblock/nx3=256', '--kokkos-num-devices=4', '--kokkos-threads=1']' returned non-zero exit status 134.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/g/g15/brown338/Software/parthenon/tst/regression/run_test.py", line 152, in <module>
    main(**vars(args))
  File "/g/g15/brown338/Software/parthenon/tst/regression/run_test.py", line 76, in main
    test_manager.Run()
  File "/g/g15/brown338/Software/parthenon/tst/regression/utils/test_case.py", line 247, in Run
    raise TestManagerError('\nReturn code {0} from command \'{1}\''
utils.test_case.TestManagerError: 
Return code 134 from command '/usr/tcetmp/bin/jsrun -a 4 -c 1 -n 1 -g 1 -r 1 -d packed --smpiargs='-gpu' /g/g15/brown338/Software/parthenon/build/example/advection/advection-example -i /g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance/parthinput.advection_performance parthenon/mesh/nx1=256 parthenon/meshblock/nx1=256 parthenon/mesh/nx2=256 parthenon/meshblock/nx2=256 parthenon/mesh/nx3=256 parthenon/meshblock/nx3=256 --kokkos-num-devices=4 --kokkos-threads=1'
    Start 34: regression_mpi_test:advection_performance
1/6 Test #34: regression_mpi_test:advection_performance ...***Failed   68.12 sec


test_dir=['/g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance']
output_dir='/g/g15/brown338/Software/parthenon/build/tst/regression/outputs/advection_performance_mpi'
driver=['/g/g15/brown338/Software/parthenon/build/example/advection/advection-example']
driver_input=['/g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance/parthinput.advection_performance']
kokkos_args=['--kokkos-num-devices=1 --kokkos-threads=1']
num_steps=5
mpirun=['/usr/tcetmp/bin/jsrun']
mpirun_opts=['-a', '4', "-c 1 -n 1 -g 4 -r 1 -d packed --smpiargs='-gpu'"]
coverage=False
*****************************************************************
Beginning Python regression testing script
*****************************************************************

Initializing Test Case
Using:
driver at:       /g/g15/brown338/Software/parthenon/build/example/advection/advection-example
driver input at: /g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance/parthinput.advection_performance
test folder:     /g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance
output sent to:  /g/g15/brown338/Software/parthenon/build/tst/regression/outputs/advection_performance_mpi

Make output folder in test if does not exist
*****************************************************************
Preparing Test Case Step 1
*****************************************************************

*****************************************************************
Running Driver
*****************************************************************

Command to execute driver
/usr/tcetmp/bin/jsrun -a 4 -c 1 -n 1 -g 4 -r 1 -d packed --smpiargs='-gpu' /g/g15/brown338/Software/parthenon/build/example/advection/advection-example -i /g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_performance/parthinput.advection_performance parthenon/mesh/nx1=256 parthenon/meshblock/nx1=256 parthenon/mesh/nx2=256 parthenon/meshblock/nx2=256 parthenon/mesh/nx3=256 parthenon/meshblock/nx3=256 --kokkos-num-devices=1 --kokkos-threads=1

*****************************************************************
Subprocess error message
*****************************************************************

b'### PARTHENON ERROR
  Message:     ### FATAL ERROR in Mesh constructor
Too few mesh blocks: nbtotal (1) < nranks (4)

  File:        ../src/mesh/mesh.cpp
  Line number: 450
### PARTHENON ERROR
  Message:     ### FATAL ERROR in Mesh constructor
Too few mesh blocks: nbtotal (1) < nranks (4)

  File:        ../src/mesh/mesh.cpp
  Line number: 450
### PARTHENON ERROR
  Message:     ### FATAL ERROR in Mesh constructor
Too few mesh blocks: nbtotal (1) < nranks (4)

  File:        ../src/mesh/mesh.cpp
  Line number: 450
### PARTHENON ERROR
  Message:     ### FATAL ERROR in Mesh constructor
Too few mesh blocks: nbtotal (1) < nranks (4)

  File:        ../src/mesh/mesh.cpp
  Line number: 450
'

@jonahm-LANL I could use your input here as welll.

@pgrete
Copy link
Collaborator

pgrete commented Jan 20, 2021

This simple advection_performance test is currently setup to measure the overhead in overdecomposition, i.e., for a fixed mesh size (here 256^3) the number of meshblocks into which the Mesh is separated is successively increased (by decreasing the block size).
Thus, this test assumes that the number of compute elements (e.g., a GPU) remains constant and (at the same time) that there's only one compute element (as the baseline performance is one MeshBlock covering the entire Mesh).
We'd need to setup various different tests to capture various performance aspects (specifically thinking of the proxy app here given that it'll allow us to increase to compute versus infrastructure related pieces in the code to more realistic levels).

@AndrewGaspar
Copy link
Contributor

@pgrete Yeah, that's all well and good - we should definitely ensure we're holding the num of GPUs constant when we're doing performance testing, which you can control using NUM_GPU_DEVICES_PER_NODE. I think all Josh is trying to ask is how do you get jsrun to correctly partition the GPUs.

@pgrete
Copy link
Collaborator

pgrete commented Jan 22, 2021

@pgrete Yeah, that's all well and good - we should definitely ensure we're holding the num of GPUs constant when we're doing performance testing, which you can control using NUM_GPU_DEVICES_PER_NODE. I think all Josh is trying to ask is how do you get jsrun to correctly partition the GPUs.

Sorry, my previous comment was probably not clear.
I was trying to say that this specific advection_performance performance test makes only sense when run on a single GPU as the baseline in that test is using a single MeshBlock for the entire Mesh. Thus, independent of CPU or GPU that test can only be run using a single rank (that's also where the error message comes from).
Thus, this test should not be run with
/usr/tcetmp/bin/jsrun -a 4 -c 1 -n 1 -g 4 -r 1 -d packed --smpiargs='-gpu' /g/g15/brown338
but rather with
/usr/tcetmp/bin/jsrun -a 1 -c 1 -n 1 -g 1 -r 1 -d packed --smpiargs='-gpu' /g/g15/brown338

It may be worth to add safety check (i.e., ranks == 1) to that test similar to the advection_convergence test:

        # make sure we can evenly distribute the MeshBlock sizes
        err_msg = "Num ranks must be multiples of 2 for convergence test." 
        assert parameters.num_ranks == 1 or parameters.num_ranks % 2 == 0, err_msg                                             
        # ensure a minimum block size of 4
        assert lin_res[0] / parameters.num_ranks >= 4, "Use <= 8 ranks for convergence test."

@JoshuaSBrown JoshuaSBrown changed the title Add machine config file for RZAnsel [WIP] Add machine config file for RZAnsel Jan 27, 2021
@JoshuaSBrown JoshuaSBrown changed the title [WIP] Add machine config file for RZAnsel Add machine config file for RZAnsel Feb 1, 2021
@JoshuaSBrown
Copy link
Collaborator Author

Thanks, everyone for the feedback, @pgrete do you want to take a last look over?

@pgrete
Copy link
Collaborator

pgrete commented Feb 1, 2021

I think making sure that the advection_performance test in its current version is only run by a single rank is good!
With respect to the advection_convergence this change is not required (this was probably not clear from our discussion on Matrix). In fact, advection_convergence is designed to work with both a single rank as well as with 2, 4 and 8 ranks. So I think we should not set that number to 1.

@JoshuaSBrown
Copy link
Collaborator Author

JoshuaSBrown commented Feb 1, 2021

I think making sure that the advection_performance test in its current version is only run by a single rank is good!
With respect to the advection_convergence this change is not required (this was probably not clear from our discussion on Matrix). In fact, advection_convergence is designed to work with both a single rank as well as with 2, 4 and 8 ranks. So I think we should not set that number to 1.

Ok, well I'm thoroughly confused, because when I run the convergence tests with 4 ranks and 4 gpus, it is still only making use of a single gpu. I'm also pretty sure I have the correct command because the restart test will actually utilize all four gpu's with the same command.

@pgrete
Copy link
Collaborator

pgrete commented Feb 1, 2021

I think making sure that the advection_performance test in its current version is only run by a single rank is good!
With respect to the advection_convergence this change is not required (this was probably not clear from our discussion on Matrix). In fact, advection_convergence is designed to work with both a single rank as well as with 2, 4 and 8 ranks. So I think we should not set that number to 1.

Ok, well I'm thoroughly confused, because when I run the convergence tests with 4 ranks and 4 gpus, it is still only making use of a single gpu. I'm also pretty sure I have the correct command because the restart test will actually utilize all four gpu's with the same command.

Let met double check in practice. At least on first sight I don't see a difference that may cause that behavior in the parameter/CMake files.

@JoshuaSBrown
Copy link
Collaborator Author


/usr/tcetmp/bin/jsrun -a 4 -c 1 -n 1 -g 4 -r 1 -d packed --smpiargs='-gpu' /g/g15/brown338/Software/parthenon/build/example/advection/advection-example -i /g/g15/brown338/Software/parthenon/tst/regression/test_suites/restart/parthinput.restart parthenon/job/problem_id=gold --kokkos-num-devices=4 --kokkos-threads=1

Gives me

Mon Feb  1 13:10:48 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.95.01    Driver Version: 440.95.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000004:04:00.0 Off |                    0 |
| N/A   29C    P0    51W / 300W |    446MiB / 16160MiB |      5%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000004:05:00.0 Off |                    0 |
| N/A   29C    P0    50W / 300W |    446MiB / 16160MiB |      6%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  On   | 00000035:03:00.0 Off |                    0 |
| N/A   29C    P0    49W / 300W |    446MiB / 16160MiB |      8%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  On   | 00000035:04:00.0 Off |                    0 |
| N/A   30C    P0    48W / 300W |    446MiB / 16160MiB |      9%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0    132066      C   ...ild/example/advection/advection-example   435MiB |
|    1    132067      C   ...ild/example/advection/advection-example   435MiB |
|    2    132068      C   ...ild/example/advection/advection-example   435MiB |
|    3    132069      C   ...ild/example/advection/advection-example   435MiB |
+-----------------------------------------------------------------------------+

But with advection covergence:

41: Test command: /usr/gapps/parthenon_shared/parthenon-project/views/rzansel/ppc64le/gcc8/2021-01-04/bin/python3.8 "/g/g15/brown338/Software/parthenon/tst/regression/run_test.py" "--mpirun" "/usr/tcetmp/bin/jsrun" "--mpirun_opts=-a" "--mpirun_opts=4" "--mpirun_opts=-c 1 -n 1 -g 4 -r 1 -d packed --smpiargs='-gpu'" "--driver" "/g/g15/brown338/Software/parthenon/build/example/advection/advection-example" "--driver_input" "/g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_convergence/parthinput.advection" "--num_steps" "25" "--test_dir" "/g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_convergence" "--output_dir" "/g/g15/brown338/Software/parthenon/build/tst/regression/outputs/advection_convergence_mpi" "--kokkos_args=--kokkos-num-devices=4 --kokkos-threads=1"
41: Test timeout computed to be: 1500
41: 
41: 
41: test_dir=['/g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_convergence']
41: output_dir='/g/g15/brown338/Software/parthenon/build/tst/regression/outputs/advection_convergence_mpi'
41: driver=['/g/g15/brown338/Software/parthenon/build/example/advection/advection-example']
41: driver_input=['/g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_convergence/parthinput.advection']
41: kokkos_args=['--kokkos-num-devices=4 --kokkos-threads=1']
41: num_steps=25
41: mpirun=['/usr/tcetmp/bin/jsrun']
41: mpirun_opts=['-a', '4', "-c 1 -n 1 -g 4 -r 1 -d packed --smpiargs='-gpu'"]
41: coverage=False
41: *****************************************************************
41: Beginning Python regression testing script
41: *****************************************************************
41: 
41: Initializing Test Case
41: Using:
41: driver at:       /g/g15/brown338/Software/parthenon/build/example/advection/advection-example
41: driver input at: /g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_convergence/parthinput.advection
41: test folder:     /g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_convergence
41: output sent to:  /g/g15/brown338/Software/parthenon/build/tst/regression/outputs/advection_convergence_mpi
41: 
41: Make output folder in test if does not exist
41: *****************************************************************
41: Preparing Test Case Step 1
41: *****************************************************************
41: 
41: *****************************************************************
41: Running Driver
41: *****************************************************************
41: 
41: Command to execute driver
41: /usr/tcetmp/bin/jsrun -a 4 -c 1 -n 1 -g 4 -r 1 -d packed --smpiargs='-gpu' /g/g15/brown338/Software/parthenon/build/example/advection/advection-example -i /g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_convergence/parthinput.advection parthenon/mesh/nx1=32 parthenon/meshblock/nx1=32 parthenon/mesh/nx2=1 parthenon/meshblock/nx2=1 parthenon/mesh/nx3=1 parthenon/meshblock/nx3=1 Advection/vy=0.0 Advection/vz=0.0 --kokkos-num-devices=4 --kokkos-threads=1
41: 
41: *****************************************************************
41: Subprocess error message
41: *****************************************************************
41: 
41: b'### PARTHENON ERROR
41:   Message:     ### FATAL ERROR in Mesh constructor
41: Too few mesh blocks: nbtotal (1) < nranks (4)
41: 
41:   File:        ../src/mesh/mesh.cpp
41:   Line number: 450
41: ### PARTHENON ERROR
41:   Message:     ### FATAL ERROR in Mesh constructor
41: Too few mesh blocks: nbtotal (1) < nranks (4)
41: 
41:   File:        ../src/mesh/mesh.cpp
41:   Line number: 450
41: ### PARTHENON ERROR
41:   Message:     ### FATAL ERROR in Mesh constructor
41: Too few mesh blocks: nbtotal (1) < nranks (4)
41: 
41:   File:        ../src/mesh/mesh.cpp
41:   Line number: 450
41: ### PARTHENON ERROR
41:   Message:     ### FATAL ERROR in Mesh constructor
41: Too few mesh blocks: nbtotal (1) < nranks (4)
41: 
41:   File:        ../src/mesh/mesh.cpp
41:   Line number: 450
41: '
41: 
41: *****************************************************************
41: Error detected while running subprocess command
41: *****************************************************************
41: 
41: Traceback (most recent call last):
41:   File "/g/g15/brown338/Software/parthenon/tst/regression/utils/test_case.py", line 235, in Run
41:     proc = subprocess.run(run_command, check=True, stdout=PIPE, stderr=PIPE)
41:   File "/usr/gapps/parthenon_shared/parthenon-project/views/rzansel/ppc64le/gcc8/2021-01-04/lib/python3.8/subprocess.py", line 512, in run
41:     raise CalledProcessError(retcode, process.args,
41: subprocess.CalledProcessError: Command '['/usr/tcetmp/bin/jsrun', '-a', '4', '-c', '1', '-n', '1', '-g', '4', '-r', '1', '-d', 'packed', "--smpiargs='-gpu'", '/g/g15/brown338/Software/parthenon/build/example/advection/advection-example', '-i', '/g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_convergence/parthinput.advection', 'parthenon/mesh/nx1=32', 'parthenon/meshblock/nx1=32', 'parthenon/mesh/nx2=1', 'parthenon/meshblock/nx2=1', 'parthenon/mesh/nx3=1', 'parthenon/meshblock/nx3=1', 'Advection/vy=0.0', 'Advection/vz=0.0', '--kokkos-num-devices=4', '--kokkos-threads=1']' returned non-zero exit status 134.
41: 
41: During handling of the above exception, another exception occurred:
41: 
41: Traceback (most recent call last):
41:   File "/g/g15/brown338/Software/parthenon/tst/regression/run_test.py", line 152, in <module>
41:     main(**vars(args))
41:   File "/g/g15/brown338/Software/parthenon/tst/regression/run_test.py", line 76, in main
41:     test_manager.Run()
41:   File "/g/g15/brown338/Software/parthenon/tst/regression/utils/test_case.py", line 245, in Run
41:     raise TestManagerError('\nReturn code {0} from command \'{1}\''
41: utils.test_case.TestManagerError: 
41: Return code 134 from command '/usr/tcetmp/bin/jsrun -a 4 -c 1 -n 1 -g 4 -r 1 -d packed --smpiargs='-gpu' /g/g15/brown338/Software/parthenon/build/example/advection/advection-example -i /g/g15/brown338/Software/parthenon/tst/regression/test_suites/advection_convergence/parthinput.advection parthenon/mesh/nx1=32 parthenon/meshblock/nx1=32 parthenon/mesh/nx2=1 parthenon/meshblock/nx2=1 parthenon/mesh/nx3=1 parthenon/meshblock/nx3=1 Advection/vy=0.0 Advection/vz=0.0 --kokkos-num-devices=4 --kokkos-threads=1'
1/1 Test #41: regression_mpi_test:advection_convergence ...***Failed   68.48 sec

@pgrete
Copy link
Collaborator

pgrete commented Feb 1, 2021

Yes, I also noticed that while updating/testing on Summit following my previous comment.
In that process I also noticed that the advection_performance test, in fact, should run with more MPI processes as I originally added the logic to make the MeshBlocks smaller based on the number of MPI processes involved.
That now makes me believe that the logic behind parameters.num_ranks is broken, i.e., it may always be 1.
This led me to test_case.py where

        argstrings = ['-np','-n']
        if len(set(argstrings) & set(self.parameters.mpi_opts)) > 1:
          print('Warning! You have set both "-n" and "-np" in your MPI options.')
          print(self.parameters.mpi_opts)
        for s in argstrings:
          if s in self.parameters.mpi_opts:
            index = self.parameters.mpi_opts.index(s)
            if index < len(self.parameters.mpi_opts) - 1:
              try:
                self.parameters.num_ranks = int(self.parameters.mpi_opts[index+1])
              except ValueError:
                pass

so I know wonder if that logic still works (in general and specifically for the jsrun based parameters used on Summit.
I'm going to call it a day but if you like to dig deeper today, my best bet is somewhere around these pieces.
Otherwise, I'll dig deeper tomorrow.

Bottom line: There's definitely a bug somewhere that resulted in the things not working as we expected them to be.

@JoshuaSBrown
Copy link
Collaborator Author

Alright this should be good to go.

@JoshuaSBrown
Copy link
Collaborator Author

All tests pass.

@JoshuaSBrown JoshuaSBrown enabled auto-merge (squash) February 3, 2021 21:12
Copy link
Collaborator

@jlippuner jlippuner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I just tested it on RZAnsel and it seems to work as expected.

I made a few minor comments/suggestions but nothing to hold up merging.

#### Allocate Node

[RZAnsel](https://hpc.llnl.gov/hardware/platforms/rzansel) is a homogeneous cluster consisting of 2,376 nodes with the IBM Power9
architecture with 44 nodes per core and 4 Nvidia Volta GPUs per node. To
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
architecture with 44 nodes per core and 4 Nvidia Volta GPUs per node. To
architecture with 44 cores per node and 4 Nvidia Volta GPUs per node. To

$ lalloc 1
```

#### Set-Up Environment (Optional, but Still Recommended, for Non-CUDA Builds)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by this (also for the Darwin instructions).

Are these whole instructions for non-CUDA builds? If so, what are the instructions for CUDA builds? Or is it only optional (but still recommended) for non-CUDA builds but required for CUDA builds?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, the last sentence answers this question, I think.

Maybe we should call this section Set-Up Environment (required for CUDA builds, optional (but recommended) for non-CUDA builds.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndrewGaspar, @jlippuner raises a good point why isn't the build configuration simply optional, as far as I can tell there is nothing in there but links to ninja, cmake, the compiler, and git. Is it because of the dependence of the cuda on the compiler?

@@ -385,6 +385,69 @@ Once you've configured your build directory, you can build with
LANL Employees - to understand how the project space is built out, see
https://re-git.lanl.gov/eap-oss/parthenon-project

### LNLL RZAnsel (Homogeneous)

Last verified 04 Jan 2021.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this up-to-date?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix these in a separate PR then.

@JoshuaSBrown JoshuaSBrown merged commit 0d96f15 into develop Feb 3, 2021
@Yurlungur Yurlungur deleted the JoshuaSBrown/setup-rzansel-machine-config branch February 9, 2021 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants