Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] OSU Microbenchmark example for LUMI #1

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

rsarm
Copy link

@rsarm rsarm commented Mar 3, 2022

This is an example with some OSU tests based on a test from reframe/hpctestlib. This branch is the continuation of the one from the PR reframe-hpc#2421 which still hasn't been finished so this is going to change.

The tests can be run with

/bin/reframe -C config/lumi.py -r -c lumi-checks/microbenchmarks/mpi/osu/osu_tests.py

from the reframe base directory.

@rsarm rsarm requested review from egplar, olouant and mszpindler March 3, 2022 14:46
@rsarm rsarm self-assigned this Mar 3, 2022
@olouant olouant changed the title OSU Microbenchmark example for LUMI [WIP] OSU Microbenchmark example for LUMI Mar 3, 2022
Comment on lines 16 to 25
stack_name = os.getenv('LUMI_STACK_NAME', None)
stack_version = os.getenv('LUMI_STACK_VERSION', None)

environs = ['PrgEnv-gnu', 'PrgEnv-cray']

if stack_name and stack_name == 'LUMI':
environs += ['cpeGNU', 'cpeCray']

if version.parse(stack_version) > version.parse('21.08'):
environs += ['cpeAOCC']
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add to create this hacky solution in order to support the different software stack of LUMI. cpeAOCC has been introduced by LUMI/21.12 and is not present in 21.08. The PrgEnv-aocc provided by Cray is still broken.

config/lumi.py Outdated
],
'max_jobs': 100,
'modules': ['LUMI'],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this, the default module (21.08) is loaded and we do not test 21.12. It also lead to failure of the cpeAOCC tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that removing 'modules': ['LUMI'] is not a good idea. To load the cpeXXX modules, the LUMI module needs to be loaded. ReFrame does that so we don't have to do it ourselves. A solution can be to keep LUMI with no version on the configuration and then use the --map-module LUMI:LUMI/21.12 when we run the tests.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. I restored 'modules': ['LUMI'] in the configuration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be me removing LUMI by accident, apologize.

Comment on lines 170 to 174
class osu_init(osu_benchmark_test_base):
@sanity_function
def validate_test(self):
#return sn.assert_eq(self.job.exitcode, 0)
return sn.assert_found(rf'^nprocs: {self.num_tasks}', self.stdout)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added osu_init test, just as an exercise. Please review if syntax is correct and I am not disruptive to generic class.

@mszpindler
Copy link

Yesterday Kurt made me realise tests should be build using partition C module instead of default L (LUMI software stack specific). It is essential for performance test while C does zen3 optimization and L not.

@mszpindler
Copy link

Yesterday Kurt made me realise tests should be build using partition C module instead of default L (LUMI software stack specific). It is essential for performance test while C does zen3 optimization and L not.

Not sure if it is the best approach for this - PR #3

self.build_system.cxx = 'hipcc'
self.build_system.cflags = ['-I/opt/cray/pe/mpich/8.1.8/ofi/crayclang/10.0/include']
self.build_system.ldflags = ['-L/opt/cray/pe/mpich/8.1.8/ofi/crayclang/10.0/lib', '-lmpi', '-L/opt/cray/pe/mpich/8.1.8/gtl/lib', '-lmpi_gtl_hsa']
self.build_system.cppflags = ['-D__HIP_PLATFORM_AMD__']
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense something like what suggested here #4 (comment)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to #3.

@mszpindler
Copy link

Waiting for #3 to be merged, cleanup and catch up with reframe upstream required

@mszpindler
Copy link

Now updated to reframe upstream, after adaptation to LUMI small and eap partitions issues #5 #6 #7 remains.

@mszpindler mszpindler marked this pull request as ready for review April 14, 2022 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants