-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler: Change tile use in DeviceAcczier to allow multiple tile sizes #2095
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! As a first thing to improve, before the rest, could you add tests to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR needs to be equipped with tests, somewhere here: https://github.com/devitocodes/devito/blob/master/tests/test_dle.py#L265
and at least one in test_gpu_openacc.py
as well
I've also left a batch of small comments
@@ -183,7 +183,10 @@ def _make_partree(self, candidates, nthreads=None): | |||
# TODO: still unable to exploit multiple par-tiles (one per nest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may now drop this comment
@@ -183,7 +183,10 @@ def _make_partree(self, candidates, nthreads=None): | |||
# TODO: still unable to exploit multiple par-tiles (one per nest) | |||
# This will require unconditionally applying blocking, and then infer | |||
# the tile clause shape from the BlockDimensions' step | |||
tile = self.par_tile[0] | |||
par_tile_length = len(self.par_tile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about simply n = ...
?
@@ -183,7 +183,10 @@ def _make_partree(self, candidates, nthreads=None): | |||
# TODO: still unable to exploit multiple par-tiles (one per nest) | |||
# This will require unconditionally applying blocking, and then infer | |||
# the tile clause shape from the BlockDimensions' step | |||
tile = self.par_tile[0] | |||
par_tile_length = len(self.par_tile) | |||
idx = index if index < par_tile_length else par_tile_length - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we prefer explicit if-then-else:
if ...:
idx = ...
else:
idx = ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and in practice this is just
index = min(index, par_tile_length-1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which one is more suitable?
tile = self.par_tile[min(index, len(self.par_tile) - 1)]
idx = min(index, len(self.par_tile) - 1)
tile = self.par_tile[idx]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the latter for homogeneity with the rest of the codebase.
devito/passes/iet/parpragma.py
Outdated
# Get the parallelizable Iterations in `tree` | ||
candidates = filter_iterations(tree, key=self.key) | ||
if not candidates: | ||
continue | ||
|
||
# Outer parallelism | ||
root, partree = self._make_partree(candidates) | ||
root, partree = self._make_partree(candidates, None, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of passing None:
self._make_partree(candidates, index=i)
…r in PragmaShmTransformer
…ng in test_gpu_openacc.py
Added test_openacc_structure in Added test_openacc_multi_tile_structure in Added test_multiple_tile_sizes in |
Requested code changes also committed |
tests/test_dle.py
Outdated
(((32, 4, 4), 1, 'tag0'), ((4, 4, 32), (4, 4, 32))), | ||
((((32, 4, 8), 1, 'tag0'), ((32, 8, 4), 2)), ((8, 4, 32), (4, 8, 32))), | ||
]) | ||
def test_openacc_structure(self, par_tile, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I made a mistake; clearly these tests belong to test_gpu_openacc
, not to test_dle
which constitutes a platform-independent set of tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(By moving them there you'll have to rename them as well -- dropping "openacc" for example)
tests/test_dle.py
Outdated
assert all(i.step == j for i, j in zip(iters, v)) | ||
|
||
|
||
def test_openacc_multi_tile_structure(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this test conceptually different than the one above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed the first one was kind of unnecessary. I've dropped it and kept the second one, which is more relevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I think this is now good to go. Enabling tests...
Updated parameters receiving in |
Codecov Report
@@ Coverage Diff @@
## master #2095 +/- ##
==========================================
- Coverage 87.80% 87.56% -0.25%
==========================================
Files 221 221
Lines 38665 39042 +377
Branches 5000 5071 +71
==========================================
+ Hits 33951 34188 +237
- Misses 4164 4295 +131
- Partials 550 559 +9
|
A proposal to allow the use of multiple tile sizes in the same operator, through the part-tile flag.
Passing multiple tuples to the part-tile flag is already supported, but only the first tuple is applied in the _make_partree method of DeviceAcczier class.
The changes consist of supplying the nesting index in the make_parallel method of PragmaShmTransformer to the _make_partree method and using the tile size with the respective index.
Cases with different tile and nesting sizes are handled. Given
n
tile sizes for an operator withm
nests:If
n = m
: The i-th tile size is assigned to the i-th nesting.If
n > m
: The firstm
tile sizes are assigned to them
nests (i-th tile size -> i-th nesting), and the remainingn-m
is not used.If
n < m
: Then
tile sizes are assigned to the firstn
nestings, and the remainingm-n
nestings replicate the last tile size.