Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when compiling miniqmc #22

Closed
ye-luo opened this issue Aug 1, 2019 · 7 comments
Closed

Crash when compiling miniqmc #22

ye-luo opened this issue Aug 1, 2019 · 7 comments

Comments

@ye-luo
Copy link

ye-luo commented Aug 1, 2019

aomp-0.7 has this issue. 0.6.5 was fine.

clang-9: /home/yeluo/git/aomp/llvm-project/llvm/lib/IR/Instructions.cpp:1349: void llvm::StoreInst::AssertOK(): Assertion `getOperand(0)->getType() == cast<PointerType>(getOperand(1)->getType())->getElementType() && "Ptr must be a pointer to Val type!"' failed.

reproducer

git clone https://github.com/ye-luo/miniqmc
cd miniqmc/build
cmake -DCMAKE_CXX_COMPILER=/home/yeluo/rocm/aomp_0.7-0/bin/clang++ \
-DENABLE_OFFLOAD=1 -DOFFLOAD_TARGET=amdgcn-amd-amdhsa \
-DCMAKE_CXX_FLAGS="-Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906" \
..
make -j15
@ronlieb
Copy link
Contributor

ronlieb commented Aug 1, 2019 via email

@ye-luo
Copy link
Author

ye-luo commented Aug 1, 2019

I built everything from scratch using the build script provided in this repo last night. Should I checkout a specific branch of llvm-project?

@gregrodgers
Copy link
Contributor

not yet. I am able to recreate the compile fails in miniqmc. I was hoping they would be easy to fix before we release 0.7-0. As 0.7-0 is now, miniqmc fails. So no need to rebuild. I see the compile fails. There are many places in clang that are missing addrspacecasts. Your code found another one.

@gregrodgers
Copy link
Contributor

Ye-luo, I was able to find the bug causing the compiler to crash for miniqmc. I pushed the update into llvm-project a few minutes ago. Till we release 0.7-0, you need to build it from source. If you have already done this you could pull the update to llvm-project which is in branch AOMP-190715. You should get that branch and all the other correct 0.7 branches if you checked out branch 0.7 in this aomp repo, and then ran clone_aomp.sh. If you did that already, just rerun clone_aomp.sh and you will get the fix. It was to file CodeGenFunction.h. Then run "build_project.sh install" which will pick up the changed file and rebuild and install the affected clang components.

Greg

@ye-luo
Copy link
Author

ye-luo commented Aug 1, 2019

@gregrodgers now miniqmc can be built correctly by the latest compiler with your fix.

@ronlieb
Copy link
Contributor

ronlieb commented Aug 3, 2020

this cmake command will fully enable the device offload in our compiler .

cmake -DCMAKE_CXX_COMPILER=/usr/lib/aomp_11.7-1/bin/clang++ -DENABLE_OFFLOAD=1 -DOFFLOAD_TARGET=amdgcn-amd-amdhsa -DCMAKE_CXX_FLAGS="-target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx900" ..

@ronlieb
Copy link
Contributor

ronlieb commented Aug 3, 2020

with that i can see the following error on 11.7-1
OMP_NUM_THREADS=4 ./bin/check_spo -n 10
miniqmc git branch: OMP_offload
miniqmc git commit: fe7367fc3d5d768db264d6613dd40d97e24cab03

Number of orbitals/splines = 192
Tile size = 192
Number of tiles = 1
Rmax = 1.7
Iterations = 10
OpenMP threads = 4

SPO coefficients size = 98304000 bytes (93.75 MB)
[GPU Memory Error] Addr: 0x7f19b4579000 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-1 (Agent handle: 0x1171ce0) on address 0x7f19b4579000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants