-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BinaryBH example: Outputs of GPU runs differ from those of CPU runs #77
Comments
Good spot, @julianakwan (although I do feel a bit stupid for not having noticed this before). I can reproduce this with a CUDA build on the A100s. I believe the issue comes from the
I haven't had time to work out if all of them are necessary. Given synchronization is expensive, we should only add in these calls where they are necessary. |
Thanks a lot for fixing this @mirenradia! There was only one extra Here are the results of
I am getting some errors in the SYCL build of our GPU workflow, so I will try to fix those first before starting a PR. |
Our SYCL build is failing when using the latest version of the oneAPI compiler ( I will open a separate issue for fixing the SYCL build with the 2025.0 compiler |
I don't think there is a problem with oneAPI 2025.0. At least it worked for me on Dawn (see #46 (comment)). |
There is some discrepancy between the outputs of the Binary BH example run using the parameter file
params_test.txt
with CUDA, SYCL and CPU builds. Some of differences are quite large and sinceparams_test.txt
is the basis of our regression test test, I've commented out the final line in.gitlab-ci.yml
comparing the plotfiles from the A100 build to those in the.github/workflows/data
directory.Differences observed on Wilkes (A100 build)
Here are the differences between the GRTeclyn outputs built using CUDA on Wilkes3 and
.github/workflows/data/plt00008_compare
usingparams_test.txt
How to reproduce
In an interactive job on a Wilkes node, load these modules (these are the same as for the current version of
.gitlab-ci.yml
) :Using a fresh pull from the
develop
branch of GRTeclyn, navigate toExamples/BinaryBH
and create the executable:Then to run the build:
This will give you a plotfile called
plt00008
. You can then compare this with the one we currently use for regression testing (but for CPU builds only) usingfcompare
:(NB: at this stage, I was no longer on the compute node and back on a login node, so the Intel build of
fcompare
is appropriate.)Differences observed on Dawn (SYCL build)
Here are the differences between the GRTeclyn outputs built using SYCL on Dawn and
.github/workflows/data/plt00008_compare
usingparams_test.txt
How to reproduce
Submit an interactive job on Dawn, then load these modules on a compute node:
Then in the
Examples/BinaryBH
directory, make the binary black hole example:Finally, to run the example:
Again,
fcompare
can be run on the outputs to produce the above result.The text was updated successfully, but these errors were encountered: