Implement vjps of vjps for HPE and SAP (CPU) #52

frostedoyster · 2024-04-13T13:01:05Z

This is needed to train models on forces and stresses, for example

…DA option in cmake

… simd_size)

Luthaf

This looks good overall, but I don't understand what's going on for the actual implementations

mops/include/mops/opsaw.hpp

mops/src/hpe/cpu.tpp

mops/src/sap/cpu.tpp

python/mops-torch/tests/sasaw.py

Luthaf · 2024-04-17T13:09:08Z

Regarding the test failure: this could be caused by a data race in OpenMP. You could try running the tests without OpenMP to check if that's the case, and/or compiling the code with the threads sanitizer, which should catch some issues at runtime.

frostedoyster · 2024-04-19T10:00:52Z

An update on the bug:

Thanks @Luthaf for the suggestion to look into parallelism and use thread sanitizer. That was indeed the issue because removing OpenMP parallelism from HPE indeed eliminates the test failure. However, I found it to be nearly impossible to test the code with thread sanitizer: on my laptop (Ubuntu 23.10), there is a known bug that prevents thread sanitizer to work properly ("unknown memory mapping"), on CI (ubuntu 20.04) there is another known bug for which the compiled thread sanitizer library is missing (linker fails), on Jed it is very difficult to build mops in the first place because it requires a workaround (due to the conflict of Jed having a cuda compiler and libtorch_cuda.so missing).

To that, I should add that the double backward of HPE is virtually useless for us: HPE cannot be used during training of equivariant models (where SAP would take its place). However, if one uses ACE/WK, the trained model can be re-expressed in terms of a few calls to HPE (rather than many to SAP), and that is its main use case. As a result, the double backward for HPE is there just for completeness, but it is virtually useless in our field (and I suspect in general... who's going to need VJP of VJP of homogeneous polynomials?).

As a result, I disabled parallelism for the VJP of VJP of HPE and I'm going to open an issue to fix it (which will be very low-priority). I will also open an issue to document the workaround to build mops-torch without CUDA in environments where a CUDA compiler is available (this PR is a good start, because it adds a MOPS_CUDA option rather than just looking at the availability of a CUDA compiler).

Regarding this PR, I will try to address the review from @Luthaf and then he will be able to review again.

mops/include/mops/hpe.hpp

mops/include/mops/opsaw.hpp

Luthaf · 2024-04-29T13:32:27Z

Let's keep this moving, we can improve the docs further later!

frostedoyster added 2 commits April 13, 2024 15:00

gradgradcheck tests to catch the issue

03545bd

Make backward operations their own autograd node

033378d

frostedoyster force-pushed the second-derivatives-cpu branch 3 times, most recently from f0d2ec8 to d0cef09 Compare April 14, 2024 09:43

Scaffold for vjps of vjps

142bcac

frostedoyster force-pushed the second-derivatives-cpu branch from d0cef09 to 142bcac Compare April 14, 2024 12:13

frostedoyster added 2 commits April 14, 2024 14:14

Unify and organize checks in C++/CUDA

36de392

Need to debug CPU-only failure on CI from CUDA laptop, so add MOPS_CU…

fa07621

…DA option in cmake

frostedoyster force-pushed the second-derivatives-cpu branch from a346353 to fa07621 Compare April 14, 2024 15:41

frostedoyster added 4 commits April 14, 2024 17:54

Add options not to compute some gradients in the reference vjps

da8fb7a

Reference implementations for the vjps of vjps

19c4df0

Implement vjp of vjp for HPE

c4a0f06

Implement vjp of vjp for SAP

64b91ab

frostedoyster changed the title ~~Implement vjps of vjps on CPU~~ Implement vjps of vjps for HPE and SAP (CPU) Apr 16, 2024

Do not use CPU-specific bounds check on CUDA

531b23b

frostedoyster force-pushed the second-derivatives-cpu branch from 40cfcec to 531b23b Compare April 16, 2024 06:26

Make sure remainder loops are tested for HPE and SAP (not multiple of…

e2edc9e

… simd_size)

frostedoyster force-pushed the second-derivatives-cpu branch from 4b4abed to 9bc1555 Compare April 16, 2024 07:22

frostedoyster requested a review from Luthaf April 16, 2024 07:29

frostedoyster force-pushed the second-derivatives-cpu branch 3 times, most recently from 8fd61df to ff1eab3 Compare April 16, 2024 10:16

Fix CI tests

5f1a125

frostedoyster force-pushed the second-derivatives-cpu branch from ff1eab3 to 5f1a125 Compare April 16, 2024 10:25

Luthaf reviewed Apr 17, 2024

View reviewed changes

mops/include/mops/opsaw.hpp Show resolved Hide resolved

mops/src/hpe/cpu.tpp Show resolved Hide resolved

mops/src/sap/cpu.tpp Outdated Show resolved Hide resolved

mops/src/sap/cpu.tpp Show resolved Hide resolved

python/mops-torch/tests/sasaw.py Outdated Show resolved Hide resolved

frostedoyster force-pushed the second-derivatives-cpu branch 2 times, most recently from 1abe770 to 9791119 Compare April 19, 2024 05:30

Merge branch 'main' into second-derivatives-cpu

883bb8b

frostedoyster force-pushed the second-derivatives-cpu branch 3 times, most recently from 7912129 to a86c3ca Compare April 19, 2024 09:55

Give up debugging and remove parallelism for hpe_vjp_vjp

d28b1fe

frostedoyster force-pushed the second-derivatives-cpu branch from a86c3ca to d28b1fe Compare April 19, 2024 09:55

frostedoyster marked this pull request as ready for review April 19, 2024 15:14

frostedoyster requested a review from Luthaf April 19, 2024 15:14

Address review

2947ab4

frostedoyster force-pushed the second-derivatives-cpu branch from b8b1989 to 2947ab4 Compare April 19, 2024 19:59

Luthaf reviewed Apr 23, 2024

View reviewed changes

mops/include/mops/hpe.hpp Outdated Show resolved Hide resolved

mops/include/mops/hpe.hpp Show resolved Hide resolved

mops/include/mops/hpe.hpp Outdated Show resolved Hide resolved

mops/include/mops/opsaw.hpp Show resolved Hide resolved

Finish documentation for the C++ functions

8c0f756

frostedoyster requested a review from Luthaf April 28, 2024 13:26

Luthaf approved these changes Apr 29, 2024

View reviewed changes

Luthaf merged commit ca81398 into main Apr 29, 2024
4 checks passed

Luthaf deleted the second-derivatives-cpu branch April 29, 2024 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement vjps of vjps for HPE and SAP (CPU) #52

Implement vjps of vjps for HPE and SAP (CPU) #52

frostedoyster commented Apr 13, 2024

Luthaf left a comment

Luthaf commented Apr 17, 2024

frostedoyster commented Apr 19, 2024 •

edited

Loading

Luthaf commented Apr 29, 2024

Implement vjps of vjps for HPE and SAP (CPU) #52

Implement vjps of vjps for HPE and SAP (CPU) #52

Conversation

frostedoyster commented Apr 13, 2024

Luthaf left a comment

Choose a reason for hiding this comment

Luthaf commented Apr 17, 2024

frostedoyster commented Apr 19, 2024 • edited Loading

Luthaf commented Apr 29, 2024

frostedoyster commented Apr 19, 2024 •

edited

Loading