-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] OSU Microbenchmark example for LUMI #1
base: master
Are you sure you want to change the base?
Conversation
stack_name = os.getenv('LUMI_STACK_NAME', None) | ||
stack_version = os.getenv('LUMI_STACK_VERSION', None) | ||
|
||
environs = ['PrgEnv-gnu', 'PrgEnv-cray'] | ||
|
||
if stack_name and stack_name == 'LUMI': | ||
environs += ['cpeGNU', 'cpeCray'] | ||
|
||
if version.parse(stack_version) > version.parse('21.08'): | ||
environs += ['cpeAOCC'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add to create this hacky solution in order to support the different software stack of LUMI. cpeAOCC has been introduced by LUMI/21.12 and is not present in 21.08. The PrgEnv-aocc provided by Cray is still broken.
config/lumi.py
Outdated
], | ||
'max_jobs': 100, | ||
'modules': ['LUMI'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this, the default module (21.08) is loaded and we do not test 21.12. It also lead to failure of the cpeAOCC tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that removing 'modules': ['LUMI']
is not a good idea. To load the cpeXXX
modules, the LUMI
module needs to be loaded. ReFrame does that so we don't have to do it ourselves. A solution can be to keep LUMI
with no version on the configuration and then use the --map-module LUMI:LUMI/21.12
when we run the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense. I restored 'modules': ['LUMI']
in the configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be me removing LUMI
by accident, apologize.
class osu_init(osu_benchmark_test_base): | ||
@sanity_function | ||
def validate_test(self): | ||
#return sn.assert_eq(self.job.exitcode, 0) | ||
return sn.assert_found(rf'^nprocs: {self.num_tasks}', self.stdout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added osu_init
test, just as an exercise. Please review if syntax is correct and I am not disruptive to generic class.
Yesterday Kurt made me realise tests should be build using partition |
Not sure if it is the best approach for this - PR #3 |
self.build_system.cxx = 'hipcc' | ||
self.build_system.cflags = ['-I/opt/cray/pe/mpich/8.1.8/ofi/crayclang/10.0/include'] | ||
self.build_system.ldflags = ['-L/opt/cray/pe/mpich/8.1.8/ofi/crayclang/10.0/lib', '-lmpi', '-L/opt/cray/pe/mpich/8.1.8/gtl/lib', '-lmpi_gtl_hsa'] | ||
self.build_system.cppflags = ['-D__HIP_PLATFORM_AMD__'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense something like what suggested here #4 (comment)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to #3.
Waiting for #3 to be merged, cleanup and catch up with reframe upstream required |
This is an example with some OSU tests based on a test from
reframe/hpctestlib
. This branch is the continuation of the one from the PR reframe-hpc#2421 which still hasn't been finished so this is going to change.The tests can be run with
from the reframe base directory.