Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scattering OMP parallelisation #243

Merged
merged 7 commits into from
Feb 7, 2020
Merged

Scattering OMP parallelisation #243

merged 7 commits into from
Feb 7, 2020

Conversation

rc83
Copy link
Collaborator

@rc83 rc83 commented Feb 4, 2020

Description

Refactor and paralellise scattering analysis

OpenMP is enabled for loops in the sample methods that have O(N^2) complexity, where N is number of particles.

Furthermore, the sample methods are rewritten to allow automatic vectorisation of trigonometric functions in GCC by using libmvec vector math library when --ffast-math is enabled. This speeds up the analysis by factor of 4 on modern processors.

Checklist

  • No unit tests were provided.
  • code naming scheme follows the convention
    • Constants for OMP ranges are single capital letters here, which is a grey zone of naming rules. Other rules should be honoured. Variables have been extensively renamed from a single letter physicist convention to English words to follow our programmers convention.
  • the source code is well documented
  • the user-manual is updated regarding the effective parallelisation
  • performance is checked in supported configurations: (GCC, Clang) × (GNU/Linux, MacOSX)

Note

The pull request includes and thus obsoletes the pull request #241.

@mlund mlund added this to the Version 2.4.0 milestone Feb 4, 2020
Copy link
Owner

@mlund mlund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! Let's think of a unittest, for example just a set of fixed particles on a box.

src/analysis.cpp Outdated
break;
}
}

ScatteringFunction::ScatteringFunction(const json &j, Space &spc) try : spc(spc) {
from_json(j);
name = "scatter";
usecom = j.value("com", true);
use_com = j.value("com", true);
save_after_sample = j.value("stepsave", false); // save to disk for each sample
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add keys to docs/schema.yml

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And to _docs/analysis.md

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come that this could come like this from the pull request #241? :-D Done. Documentation shall also contain ipbc. As ipbc has no meaning for debye scheme, shall we keep it as an independent attribute, or shall we have three different schemes?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, debye is quite different and also requres dq, qmin, qmx while these are automatic w. explicit and ipbc that om the other hand require qmax. I would keep ipbc as an option in the explicit scheme and split debye to a separate analysis.

src/scatter.h Outdated
}
// as of January 2020 the std::transform_reduce is not implemented in libc++
T sum_cos = 0;
for (auto &qr : qr_products) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How well does Eigen fare for this operation? It's supposed to handle vectorisation on a number of architectures. I believe the operation here would by qr_products.sin().sum(). Note also that for the coming Eigen 4, iterators are supported.

Copy link
Owner

@mlund mlund Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. your sin/cos optimization may be applicable also for Ewald summation:

Q += i.charge * EwaldData::Tcomplex(std::cos(dot), std::sin(dot)); // 'Q^q', see eq. 25 in ref.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: on GCC9/MacOS, the separate cos/sin (ifdef GNUC) are slower

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following gives a factor four speed-up on GCC and Clang:

Eigen::VectorXd qr_products;
qr_products.noalias() = positions * q;
T sum_sin = qr_products.array().sin().sum();
T sum_cos = qr_products.array().cos().sum();

where positions must by a MatrixXd(N,3) object. Best to await a unit-test to see if it's doing the right thing, though :-)

@rc83
Copy link
Collaborator Author

rc83 commented Feb 6, 2020

  1. I had to revert one commit originating from the master branch to make the pull request compile.
  2. There is no code related to performance testing yet; neither in scatter.h nor scatter_test.h. I will prepare another commit tomorrow.

Create a merge commit is probably the best strategy how to merge this pull request.

rc83 added 7 commits February 7, 2020 14:00
OpenMP is enabled for loops in the sample methods that have O(N^2)
complexity, where N is number of particles.

Furthermore, the sample methods are rewritten to allow automatic
vectorisation of trigonometric functions in GCC by using libmvec
vector math library when --ffast-math is enabled. This speeds up
the analysis by factor of 4 on modern processors.
Different implementations exhibit very different performance (factor
of 5 observed) depending on the compiler and system environment.
Add unit and performance tests.
@rc83 rc83 merged commit 280ad6c into mlund:master Feb 7, 2020
@rc83 rc83 deleted the scatter-omp branch February 21, 2020 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants