-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scattering OMP parallelisation #243
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR! Let's think of a unittest, for example just a set of fixed particles on a box.
src/analysis.cpp
Outdated
break; | ||
} | ||
} | ||
|
||
ScatteringFunction::ScatteringFunction(const json &j, Space &spc) try : spc(spc) { | ||
from_json(j); | ||
name = "scatter"; | ||
usecom = j.value("com", true); | ||
use_com = j.value("com", true); | ||
save_after_sample = j.value("stepsave", false); // save to disk for each sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add keys to docs/schema.yml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to _docs/analysis.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come that this could come like this from the pull request #241? :-D Done. Documentation shall also contain ipbc
. As ipbc
has no meaning for debye
scheme, shall we keep it as an independent attribute, or shall we have three different schemes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, debye is quite different and also requres dq, qmin, qmx while these are automatic w. explicit and ipbc that om the other hand require qmax. I would keep ipbc as an option in the explicit scheme and split debye to a separate analysis.
src/scatter.h
Outdated
} | ||
// as of January 2020 the std::transform_reduce is not implemented in libc++ | ||
T sum_cos = 0; | ||
for (auto &qr : qr_products) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How well does Eigen fare for this operation? It's supposed to handle vectorisation on a number of architectures. I believe the operation here would by qr_products.sin().sum()
. Note also that for the coming Eigen 4, iterators are supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw. your sin/cos optimization may be applicable also for Ewald summation:
Line 127 in 58adfe9
Q += i.charge * EwaldData::Tcomplex(std::cos(dot), std::sin(dot)); // 'Q^q', see eq. 25 in ref. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: on GCC9/MacOS, the separate cos/sin (ifdef GNUC) are slower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following gives a factor four speed-up on GCC and Clang:
Eigen::VectorXd qr_products;
qr_products.noalias() = positions * q;
T sum_sin = qr_products.array().sin().sum();
T sum_cos = qr_products.array().cos().sum();
where positions
must by a MatrixXd(N,3)
object. Best to await a unit-test to see if it's doing the right thing, though :-)
Create a merge commit is probably the best strategy how to merge this pull request. |
OpenMP is enabled for loops in the sample methods that have O(N^2) complexity, where N is number of particles. Furthermore, the sample methods are rewritten to allow automatic vectorisation of trigonometric functions in GCC by using libmvec vector math library when --ffast-math is enabled. This speeds up the analysis by factor of 4 on modern processors.
Different implementations exhibit very different performance (factor of 5 observed) depending on the compiler and system environment. Add unit and performance tests.
Description
Refactor and paralellise scattering analysis
OpenMP is enabled for loops in the sample methods that have O(N^2) complexity, where N is number of particles.
Furthermore, the sample methods are rewritten to allow automatic vectorisation of trigonometric functions in GCC by using
libmvec
vector math library when--ffast-math
is enabled. This speeds up the analysis by factor of 4 on modern processors.Checklist
Note
The pull request includes and thus obsoletes the pull request #241.