-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SplineC2C/R2R rotation with BLAS #4710
Conversation
Let's be careful changes in derived classes aren't lost as we move to templated classes in #4684 |
There may be some subtleties related to precision. If a single precision spline is used, this change reduces the rotation matrix to single precision before the multiplication. |
} | ||
} | ||
rot_mat_padded[i * Nsplines + j] = rot_mat.data()[i * OrbitalSetSize + j]; | ||
BLAS::gemm('N', 'N', Nsplines, BasisSetSize, Nsplines, ST(1.0), rot_mat_padded.data(), Nsplines, (*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we don't need rot_mat_padded
.
BLAS::gemm('N', 'N', OrbitalSetSize, BasisSetSize, OrbitalSetSize, ST(1.0), rot_mat.data(), OrbitalSetSize, (*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a precision issue, unless BLAS::gemm can handle matrices with different precision types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was having trouble getting the BLAS call to work without padding. I am trying to figure it out now
I agree there is still a precision issue. I wasn't sure the best way to handle that either. the spline coeffs are ST type and the rotation is RealType. The BLAS::gemm doesn't like mixed precision types, so I had to do some conversion somewhere. If I changed coef_copy_ back to RealType, and the rotation is RealType, the problem then lies in spl_coefs. It is ST*, and I can't just cast it to RealType* in the case of ST=float.
Making another temporary copy would be costly memory wise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you intending to work with single or double precision splines? I ran into some problems with rotation and single precision splines early on, and since then have only used double precision splines.
The code might need to be split differently based spline precision - call gemm for double precision, and an explicit loop for single precision. That would at least speed up the double precision case.
The precision issue also relates to the history list vs. global rotation method. With the history list, the rotations get accumulated in the existing coefficient storage. Single precision coefficients fail at this accumulation of small changes. With the global rotation method this might not be an issue, since this accumulation of rotations happens in the rotation matrix. It might work to compute the rotation matrix in double precision, then convert to single precision, and then do the matrix multiplication. But that would need to be tested, since the matrix multiplication step might also be sensitive to precision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer double only, but some of the unit tests use float so I was trying to get it to work in general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be okay with different code paths for float vs double (use the loop for float and gemm for double). Using "if constexpr" to test the type should make it straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markdewing, added a split depending on float vs. double. this should avoid the loss in precision.
@ye-luo still trying to figure out how to avoid adding the padded rotation matrix. I thought it should be as simple as fiddling with the LDx parameters, but having trouble getting it to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A code path split by precision should also avoid the need for the extra padded rotation matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ye-luo I think I have the blas call correct now to avoid needing to construct the padded rotation matrix. It is working correctly on some test matrices, and unit tests pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we don't need
rot_mat_padded
.BLAS::gemm('N', 'N', OrbitalSetSize, BasisSetSize, OrbitalSetSize, ST(1.0), rot_mat.data(), OrbitalSetSize, (*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines);
I should have read this in detail sooner. I didn't realize you had already told me how to it. Took me a bit, but I stumbled onto doing correctly
Thanks for pointing this out. I had two PRs recently (#4706, #4701) that touched RotatedSPOs that doesn't look like it has been captured in #4684. I can make a PR into @williamfgc's branch with the corresponding changes to bring RotatedSPOs up to date. And if this gets merged once I sort out the review comments can do the same for this if that works |
@camelto2 @quantumsteve thanks for bringing this up. Your changes can be accommodated into the refactoring effort after merged into develop, we just won't touch it for now. We rebase develop constantly. |
(*coef_copy_).data(), Nsplines, ST(0.0), spl_coefs, Nsplines); | ||
} | ||
else | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tough: apparently everything I did had hidden type mis-matches because ST is not guaranteed to be equivalent to ValueType. Seems to me the underlying problem is the type system of the splines is kinda divorced from the rest of the code. If it's true that single precision may be problematic for rotations, then why not ditch single precision splines and keep the production code simple? Or is there some way to enforce double precision splines if rotation is added? Just trying to think of ways to avoid adding complexity to production code if not absolutely necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is single precision splines are still of interest for size reasons.
Figured out how to do C2C correctly using the BLAS call. added that here as well |
Test this please |
else | ||
{ | ||
// if ST is float, RealType is double and ValueType is std::complex<double> for C2C | ||
// Just use naive matrix multiplication in order to avoid losing precision on rotation matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of doubt precision loss matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markdewing was mentioning that he had run into problems with precision when using the float splines and wanted the split between double and float. Maybe he can chime in with more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were cases where splines with float precision had trouble with orbital rotation optimization, and those problems went away when I switched to double precision splines. I haven't investigated further.
The important point for these code paths is that the type of the spline (ST) and the main code type (RealType) have to match for BLAS calls to work without additional copies. For full precision builds, that means double precision splines. For mixed precision builds, I would guess the single precision splines would follow the BLAS code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How it stands currently is that coefs_copy_ and spl_coefs are ST, whereas rot_mat is passed in as ValueType. There isn't a mixed precision blas call, so in order to use blas we would have to make a copy of rot_mat of ST type. That would allow BLAS to be used always, but you are losing precision when ST is float since it is passed in as ValueType (double or std::complex). I'm not entirely sure how much that precision matters, since the output spline coeffs are ST anyway. We are doing an extra copy, but it is only Norb^2 so shouldn't be a problem.
How it is currently written avoids the extra copy of rot_mat, but only benefits from BLAS when ST == RealType.
How common are runs where ST != RealType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NiO performance tests have the splines set with precision="single", so I think it's still an important case in general. I'm not sure how important it will be for cases involving rotation. Though my guess is someone will want to try it due to memory pressure on the size of the spline coefficients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us defer the investigation of precision. It is important but we need the feature working first.
Test this please |
Agreeing with Ye here - let us focus on getting the feature working. |
Please review the developer documentation
on the wiki of this project that contains help and requirements.
Proposed changes
change naive matrix multiplication in SplineR2R::applyRotation to a blas call. Speeds the applyRotation call up by quite a bit.
note that these changes should be covered by the unit test, just changing how the matrix multiplication is done
Update: also added the corresponding change needed for SplineC2C.
What type(s) of changes does this code introduce?
Delete the items that do not apply
New feature
Other (please describe):
Does this introduce a breaking change?
What systems has this change been tested on?
macOS
Checklist
Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is
simply a reminder of what we are going to look for before merging your code.