-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
E3SM-kernels #1416
Comments
The following is a reproducer for a segmentation fault that I get when running the atmosphere kernel. There are a number of comments inline about changes that make the segfault disappear (prefixed with !NOTE). Any suggestions for where to look to fix this? I'm compiling with
|
Hi @AlexisPerry, I think there is something wrong with the base address computation of So if you want to look into it, it might be worth looking at fir.embox codegen base address compuation (here) Here is my small small FIR repro that you can with tco+llc and link with Fortran runtime:
I suspect something is wrong with the
|
@AlexisPerry, are you already working on the issue? Otherwise, I can investigate more and make a fix for this issue. |
@jeanPerier I've started working on it, but I haven't gotten too far yet. If you have other things you're working on, I'd prioritize those for now. I'll ping you if (when?) I get stuck. Thanks! |
Sounds good to me, thanks for looking into this ! |
Hi @AlexisPerry, do not hesitate to ask if you need any help on this one. Given it can lead compiled program into undefined behaviors, I think we should try to get this bug fixed before lowering is fully upstreamed to avoid having people running into it without being easily able to link the issue to this one. |
Fix #1416. The `constRows` variable was being decremented too soon, causing the last constant interior dimension extent being used to multiply the GEP offset. This lead to wrong address computation and caused segfaults.
Fix #1416. The `constRows` variable was being decremented too soon, causing the last constant interior dimension extent being used to multiply the GEP offset. This lead to wrong address computation and caused segfaults.
Hi @AlexisPerry and @kiranchandramohan , the fix for the reported problem was merged, please close this issue whenever you are able to verify that it did actually fix the E3SM issue. |
Fix flang-compiler#1416. The `constRows` variable was being decremented too soon, causing the last constant interior dimension extent being used to multiply the GEP offset. This lead to wrong address computation and caused segfaults.
I just confirmed that the new patch fixes the error with the atmosphere kernel. It runs now (albeit slower) and gets the same answer as gfortran. |
I get the following error when running mmf-mpdata-tracer built with
|
Fix flang-compiler/f18-llvm-project#1416. The `constRows` variable was being decremented too soon, causing the last constant interior dimension extent being used to multiply the GEP offset. This lead to wrong address computation and caused segfaults. Note: also upstream fir.embox tests that can be upstreamed. Differential Revision: https://reviews.llvm.org/D123130
Just checked and all the kernels build and run now. Here is the process I used following a standard build of flang from main: atmosphere
make flang=1 mmf-mpdata-tracer
make flang=1 nested_loops
make llvm |
Enable building and execution of E3SM kernels with the flang compiler.
The text was updated successfully, but these errors were encountered: