Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCPP acceptance: fv3_stochy / fv3_ccpp stochy bit-for-bit identical #205

Merged
merged 3 commits into from
Feb 14, 2019

Conversation

climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented Feb 7, 2019

These PRs reduce the optimization of a particular routine deep inside the stochastic physics code to obtain bit-for-bit identical results of the (fv3_control based) regression tests fv3_stochy / fv3_ccpp_stochy (static build) in PROD mode.

@climbfuji
Copy link
Collaborator Author

@climbfuji
Copy link
Collaborator Author

climbfuji commented Feb 8, 2019

Standard regression tests (Theia, Intel 18, REPRO) all pass as expected.

rt_ccpp_hybrid.log
rt_ccpp_ref_create.log
rt_ccpp_standalone.log
rt_ccpp_static.log
rt_full.log

@climbfuji
Copy link
Collaborator Author

CCPP acceptance tests passed/failed as expected, list of failing tests (down one):

fv3_ccpp_stretched
fv3_ccpp_stretched_nest
fv3_ccpp_regional_control
fv3_ccpp_regional_restart
fv3_ccpp_regional_quilt
fv3_ccpp_regional_c768
fv3_ccpp_control_debug
fv3_ccpp_stretched_nest_debug
fv3_ccpp_gfdlmp
fv3_ccpp_csawmgshoc
fv3_ccpp_csawmg3shoc127
fv3_ccpp_csawmg
fv3_ccpp_gfdlmp_32bit
fv3_ccpp_cpt

rt_ccpp_ref_for_acceptance_create.log
rt_ccpp_static_for_acceptance.log

@climbfuji
Copy link
Collaborator Author

Ready to merge!

Copy link
Contributor

@llpcarson llpcarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved

Copy link
Collaborator

@grantfirl grantfirl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I could find were whitespace changes for get_stochy_pattern.F90 (correct?). So, the !DIR$ OPTIMIZE:1 is what lowers optimization for one subroutine within a file? I didn't even know that was possible. Cool.

@climbfuji
Copy link
Collaborator Author

Yes, !DIR$ OPTIMIZE:N is an Intel-specific compiler directive, ignored by other compilers. It lowers the optimization for this routine only, not for any of the following/preceding subroutines, and also not for any "contained" subroutines. N can be 0,1,2 and I believe even higher. You can also say !DIR$ NOOPTIMIZE instead of !DIR$ OPTIMIZE:0.
These directives are a little confusing, because they work differently. Another one is !DIR$ NOFMA, which disables fused-multiply-adds (FMAs) that come with AVX2 from the point in the file where the directive is found until a !DIR$ FMA is detected. (just for your info)

@climbfuji climbfuji merged commit 154ce4b into NCAR:master Feb 14, 2019
@climbfuji climbfuji deleted the stochy_bitforbit_prod branch June 27, 2022 03:24
Qingfu-Liu pushed a commit to Qingfu-Liu/ccpp-physics that referenced this pull request May 18, 2024
Combination PR for ozone diagnostics, metadata intent bugfixes, sfcsub.F landmask bugfix, and canopy resistance output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants