Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legality checks vs vectorizer features #16

Open
xoofx opened this issue Apr 30, 2018 · 1 comment
Open

Legality checks vs vectorizer features #16

xoofx opened this issue Apr 30, 2018 · 1 comment
Labels

Comments

@xoofx
Copy link
Contributor

xoofx commented Apr 30, 2018

Hey,
This is more a general question about the separation often cited between legality checks vs the features of a vectorizer (seen here, but also in Intel VPlan slides)

It is said "RV does not perform exhaustive legality checks ", but I'm failing to see how a legality check is not vectorizer dependent? For example, a vectorizer could be able to vectorize a conditional while another would not, but what would be the output of a legality check without the knowledge of way the vectorizer is working?
At some point, the legality check has to generate the same kind of "execution mask" for the code to check that there are no overlapping writes or read depending on previous writes...etc. so How this separation can work in practice?

@simoll
Copy link
Member

simoll commented May 3, 2018

You are right that the precise notion of legality depends on the vectorizer.
What is meant by separation of legality checks is that some parts of legality checking are not implemented in the vectorizer itself but offloaded to other analysis.

For example, the least common denominator among loop vectorizers is that there must not be any data races between loop iterations (within a minimal dependence distance).
A third party has to establish that property, eg an loop iteration analysis or users through pragmas (pragma omp simd, ..).
On the other hand, detection of recurrence/reduction patterns is usually handled by the vectorizer itself since it is very implementation specific.

In practice, regarding the example of overlapping memory accesses, i see the following approach for RV:

  1. have an analysis that detects potential runtime dependences.
  2. Transform the loop to make the dependences explicit: one approach is to insert vector-length agnostic intrinsics in the loop (cf vpconflict in Intels "FlexVec" PLDI'16 paper).
// scalar loop
for (int i = 0; i < n; ++i) {
  A[i] = A[f(i)];
}

// prepared loop (vector-length agnostic style)
for (int i = 0; i < n;) {
  iVec = f([i:i+n]);
  conflictIndex = rv_conflict([i:i+n], iVec)
  // the hypothetical rv_conflict returns the first index j
  // with j >= i where f(j) is in the interval [i:j].
  // Inputs are varying, return value is uniform.

  if (i < conflictIndex) {
    // this will execute without data race.
    A[i] = A[iVec];
  }
  i = conflictIndex + rv_strided(1);
  // advance to the next chunk,
  // rv_strided(1) will evaluate to [0, 1, ...].
}
  1. Run the default RV pipeline on the prepared loop. Since all dependences (on memory) have been encoded/resolved explicitely by means of RV's intrinsics, it will generate correct code.

@simoll simoll added the question label Aug 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants