Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: Change tile use in DeviceAcczier to allow multiple tile sizes #2095

Merged
merged 6 commits into from
May 3, 2023

Conversation

gpc1064
Copy link

@gpc1064 gpc1064 commented Mar 29, 2023

A proposal to allow the use of multiple tile sizes in the same operator, through the part-tile flag.
Passing multiple tuples to the part-tile flag is already supported, but only the first tuple is applied in the _make_partree method of DeviceAcczier class.

The changes consist of supplying the nesting index in the make_parallel method of PragmaShmTransformer to the _make_partree method and using the tile size with the respective index.

Cases with different tile and nesting sizes are handled. Given n tile sizes for an operator with m nests:

If n = m: The i-th tile size is assigned to the i-th nesting.
If n > m: The first m tile sizes are assigned to the m nests (i-th tile size -> i-th nesting), and the remaining n-m is not used.
If n < m: The n tile sizes are assigned to the first n nestings, and the remaining m-n nestings replicate the last tile size.

Copy link
Contributor

@georgebisbas georgebisbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! As a first thing to improve, before the rest, could you add tests to this PR?

Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR needs to be equipped with tests, somewhere here: https://github.com/devitocodes/devito/blob/master/tests/test_dle.py#L265

and at least one in test_gpu_openacc.py as well

I've also left a batch of small comments

@@ -183,7 +183,10 @@ def _make_partree(self, candidates, nthreads=None):
# TODO: still unable to exploit multiple par-tiles (one per nest)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may now drop this comment

@@ -183,7 +183,10 @@ def _make_partree(self, candidates, nthreads=None):
# TODO: still unable to exploit multiple par-tiles (one per nest)
# This will require unconditionally applying blocking, and then infer
# the tile clause shape from the BlockDimensions' step
tile = self.par_tile[0]
par_tile_length = len(self.par_tile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about simply n = ... ?

@@ -183,7 +183,10 @@ def _make_partree(self, candidates, nthreads=None):
# TODO: still unable to exploit multiple par-tiles (one per nest)
# This will require unconditionally applying blocking, and then infer
# the tile clause shape from the BlockDimensions' step
tile = self.par_tile[0]
par_tile_length = len(self.par_tile)
idx = index if index < par_tile_length else par_tile_length - 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we prefer explicit if-then-else:

if ...:
    idx = ...
else:
    idx = ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and in practice this is just

index = min(index, par_tile_length-1)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one is more suitable?

tile = self.par_tile[min(index, len(self.par_tile) - 1)]
idx = min(index, len(self.par_tile) - 1)
tile = self.par_tile[idx]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say the latter for homogeneity with the rest of the codebase.

# Get the parallelizable Iterations in `tree`
candidates = filter_iterations(tree, key=self.key)
if not candidates:
continue

# Outer parallelism
root, partree = self._make_partree(candidates)
root, partree = self._make_partree(candidates, None, i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of passing None:

self._make_partree(candidates, index=i)

@gpc1064
Copy link
Author

gpc1064 commented Mar 31, 2023

Thanks for this! As a first thing to improve, before the rest, could you add tests to this PR?

This PR needs to be equipped with tests, somewhere here: https://github.com/devitocodes/devito/blob/master/tests/test_dle.py#L265

and at least one in test_gpu_openacc.py as well

I've also left a batch of small comments

Added test_openacc_structure in test_dle.py::TestBlockingParTile to ensure correct blocking while used with OpenACC's tile.

Added test_openacc_multi_tile_structure in test_dle.py::TestBlockingParTile to ensure correct blocking and applying multiple tile sizes with OpenACC.

Added test_multiple_tile_sizes in test_gpu_openacc.py::TestCodeGeneration ensuring that multiple tile sizes are applied correctly across nests.

@gpc1064
Copy link
Author

gpc1064 commented Mar 31, 2023

Requested code changes also committed

(((32, 4, 4), 1, 'tag0'), ((4, 4, 32), (4, 4, 32))),
((((32, 4, 8), 1, 'tag0'), ((32, 8, 4), 2)), ((8, 4, 32), (4, 8, 32))),
])
def test_openacc_structure(self, par_tile, expected):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I made a mistake; clearly these tests belong to test_gpu_openacc, not to test_dle which constitutes a platform-independent set of tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(By moving them there you'll have to rename them as well -- dropping "openacc" for example)

assert all(i.step == j for i, j in zip(iters, v))


def test_openacc_multi_tile_structure(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this test conceptually different than the one above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the first one was kind of unnecessary. I've dropped it and kept the second one, which is more relevant

Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I think this is now good to go. Enabling tests...

@gpc1064
Copy link
Author

gpc1064 commented Apr 20, 2023

Updated parameters receiving in _make_partree to include index. Local tests ran smoothly :)

@codecov
Copy link

codecov bot commented Apr 27, 2023

Codecov Report

Merging #2095 (c294588) into master (d0e705d) will decrease coverage by 0.25%.
The diff coverage is 17.50%.

@@            Coverage Diff             @@
##           master    #2095      +/-   ##
==========================================
- Coverage   87.80%   87.56%   -0.25%     
==========================================
  Files         221      221              
  Lines       38665    39042     +377     
  Branches     5000     5071      +71     
==========================================
+ Hits        33951    34188     +237     
- Misses       4164     4295     +131     
- Partials      550      559       +9     
Impacted Files Coverage Δ
tests/test_gpu_openacc.py 5.69% <6.06%> (-1.40%) ⬇️
devito/passes/iet/languages/openacc.py 65.00% <33.33%> (-0.55%) ⬇️
devito/passes/iet/parpragma.py 87.61% <100.00%> (+0.95%) ⬆️

... and 34 files with indirect coverage changes

@FabioLuporini FabioLuporini merged commit 12544a8 into devitocodes:master May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants