Re-implement box and region types #204

fknorr · 2023-08-15T10:01:32Z

What started as academic curiosity turned into a classic month-long summertime deep-dive into a topic that is not even close to the top of our priority list. Oh well.

Currently, Celerity uses AllScale's GridBox and GridRegion types to describe point sets that originate from boxes or tilings of boxes. Out of these, GridRegion is particularly interesting: Internally a vector<GridBox>, it ensures that the tiling does not contain overlaps, and also opportunistically merges adjacent boxes when computing unions.

On the path to this PR, I set out on the following mission:

Concisely re-implement box and region types in a way that fits current Celerity, namely through an int dimensions parameter, unsigned coordinate types and support for 0-dimensional instances.
Investigate whether there's a better, or even optimal way of computing minimal, non-overlapping tilings for regions.
Optimize the implementation of union, intersection and difference operations without resorting to a full R-tree.

The user-facing API is mostly self-explanatory and similar to the old one, but the inner workings require some explanation. I have made an effort to document all algorithmic decisions as well as possible.

Finally, this PR is able to remove the entirety of AllScale from our dependencies (some 20k LOC) 🥳

Region Normalization

The key innovation in the new region implementation is normalization. While regions are still represented as vectors of boxes, the new region type normalizes the tiling such that any two regions that cover the same points have identical box sequences. Normalization of an arbitrary user-provided box set works as follows:

Remove all empty boxes
In all dimensions except the last ("fastest"), collect the set of box boundaries (minima and maxima) in that dimension as the set of dissection lines
Dissect all boxes along the obtained lines
Remove all pairwise fully-covered boxes
Merge connected / overlapping boxes, first along the the last ("fastest") dimension, up until the first ("slowest") dimension
Sort boxes according to their coordinates

The resulting representation is unique for the covered set of points and maximizes the extent along the last ("fastest") dimension while minimizing the number of total boxes. A small example in 2D:

All region set operations (union, intersection, difference) work on these normalized regions and take the appropriate step to ensure their result remains normalized as well.

Effective Dimensionality

Like we saw in the implementation of the new region map, treating every region as three-dimensional for the purpose of "dimensionality erasure" can be costly. This is why this PR introduces the concept of effective dimensionality:

If the last k coordinates of an n-dimensional range are 1, of an id are 0 or of a box are (0, 1), its effective dimensionality is n-k.

Practically this means that the result of a box_cast<3>(box<1>) has storage dimensionality 3 but retains effective dimensionality 1, and it can be box_cast back to a 1-dimensional box without data loss.

The efficiency of region algorithms profits from knowledge about the data's effective dimensionality. Instead of true dimensionality erasure through a pimpl or similar, the region algorithms will detect the effective dimensionality on the data itself and dispatch to the correct, optimal (templated) implementation.

To improve robustness, this PR turns {range,id,chunk,subrange,box,region}_cast into checked casts, meaning that a debug build will assert that no type can be cast to a storage dimensionality below its effective dimensionality. Multiple tests were using the cast functions to create lower-dimensionality ranges in generic tests, for these, I have introduced test_utils::truncate_{range,id,...} as a replacement.

Grid Microbenchmarks

For development, I have implemented a bunch of benchmarks on both old and new implementations (for the old impl, benchmark code is added in Re-implement grid data structures with normalized regions and removed again in Remove inclusion of old grid implementation).

The new impl appears to be about 4x faster on average across a wide variety of scenarios, but there are also individual instances where it is >5x slower. Bear in mind that this is not an apples-to-apples comparison because the resulting tilings differ between old and new impl.

Future Work

Use the 0-dimensional region type in our new region_map implementation
Make region_map determine the impl dimensionality from its range::get_min_dimensions
Return regions instead of boxes in region_map::get_region_values
Investigate if region_map can be made to operate in the same normalized representation to guarantee determinism

github-actions

clang-tidy made some suggestions

include/ranges.h

src/grid.cc

github-actions · 2023-08-15T11:38:26Z

Benchmark results with std::vector regions

Check-perf-impact results: (ef93035622456aad56ecc5a640eb0c87)

⚠️ Significant slowdown in some microbenchmark results: 4 individual benchmarks affected
🚀 Significant speedup in some microbenchmark results: 31 individual benchmarks affected
➕ Added microbenchmark(s): 55 individual benchmarks affected

Overall relative execution time: 0.93x (mean of relative medians)

src/grid.cc

PeterTh

Looks good overall, with a few change requests/comments.

As previously discussed offline, I am a bit concerned about the increase in the number of benchmark cases, since the current infrastructure has no way concepts of groups, or the importance of individual tests. But this is something which can be fixed independently in a future PR.

include/buffer_transfer_manager.h

include/grid.h

include/ranges.h

include/workaround.h

src/distributed_graph_generator.cc

src/grid.cc

github-actions · 2023-08-25T15:53:20Z

Benchmark results with gch::small_vector regions

Check-perf-impact results: (734d1429560e5faf06afaacb00f6674a)

⚠️ Significant slowdown in some microbenchmark results: benchmark intrusive graph dependency handling with N nodes - 10 / checking for dependencies, benchmark intrusive graph dependency handling with N nodes - 100 / creating nodes, generating large command graphs for N nodes - 16 / contracting tree topology
🚀 Significant speedup in some microbenchmark results: 37 individual benchmarks affected
➕ Added microbenchmark(s): 55 individual benchmarks affected

Overall relative execution time: 0.91x (mean of relative medians)

fknorr · 2023-08-25T16:00:17Z

Regions appear to be marginally faster when using gch::small_vector as the coordinate storage, even with mimalloc active. This is probably due to many regions just being single boxes in practice.

PeterTh · 2023-08-28T10:02:43Z

Regions appear to be marginally faster when using gch::small_vector as the coordinate storage, even with mimalloc active. This is probably due to many regions just being single boxes in practice.

This makes sense to me, even with mimalloc no allocation is better than an allocation. In that case, I'd go with that storage type, it's not like it changes anything about the interface or usage.

psalz

Very nice! That's an impressive level of optimization. I cannot say that I've groked all the algorithms yet, will have to take another look on Monday.

include/region_map.h

src/buffer_transfer_manager.cc

src/buffer_manager.cc

test/grid_test_utils.cc

include/print_utils.h

include/ranges.h

test/grid_test_utils.h

src/grid.cc

psalz

A few more notes!

src/grid.cc

test/grid_benchmarks.cc

github-actions

clang-tidy made some suggestions

test/integration/backend.cc

…s broken)

…tail:: for public types

github-actions · 2023-09-14T07:23:54Z

Check-perf-impact results: (5a19ced85f862a00d0114dd241122462)

⚠️ Significant slowdown in some microbenchmark results: building command graphs in a dedicated scheduler thread for N nodes - 1 > immediate submission to a scheduler thread / contracting tree topology, benchmark independent task pattern with N tasks - 100 / task generation
🚀 Significant speedup in some microbenchmark results: 28 individual benchmarks affected
➕ Added microbenchmark(s): 55 individual benchmarks affected

Overall relative execution time: 0.93x (mean of relative medians)

fknorr · 2023-09-14T07:28:02Z

The huge "slowdown" in the independent-task-generation benchmark looks spurious to me.

PeterTh · 2023-09-14T09:57:36Z

Probably. You'd think a test where a run takes ~10ms wouldn't have such big variance, but the previous run already had huge error bars. A bit surprising.

psalz

🚢

fknorr mentioned this pull request Aug 15, 2023

Re-implement box and region types #201

Closed

3 tasks

fknorr self-assigned this Aug 15, 2023

github-actions bot reviewed Aug 15, 2023

View reviewed changes

include/ranges.h Show resolved Hide resolved

src/grid.cc Show resolved Hide resolved

fknorr marked this pull request as ready for review August 15, 2023 11:41

fknorr requested review from psalz and PeterTh August 15, 2023 11:42

fknorr commented Aug 15, 2023

View reviewed changes

src/grid.cc Outdated Show resolved Hide resolved

fknorr force-pushed the grid-from-scratch branch from 7a1f0dd to 0d17c4c Compare August 23, 2023 09:40

fknorr added a commit to fknorr/celerity-runtime that referenced this pull request Aug 23, 2023

[BASE] celerity#204 Re-implement box and region types

ee39542

PeterTh requested changes Aug 24, 2023

View reviewed changes

celerity deleted a comment from github-actions bot Aug 25, 2023

fknorr force-pushed the grid-from-scratch branch from 1559539 to c24bdf3 Compare August 28, 2023 08:01

fknorr mentioned this pull request Aug 28, 2023

Add GDB pretty-printers for range and grid types #207

Merged

3 tasks

fknorr added this to the 0.5.0 milestone Sep 8, 2023

psalz reviewed Sep 8, 2023

View reviewed changes

fknorr force-pushed the grid-from-scratch branch from 2f190f4 to 1ec1219 Compare September 9, 2023 10:58

psalz requested changes Sep 12, 2023

View reviewed changes

src/grid.cc Outdated Show resolved Hide resolved

src/grid.cc Outdated Show resolved Hide resolved

src/grid.cc Outdated Show resolved Hide resolved

src/grid.cc Outdated Show resolved Hide resolved

test/grid_benchmarks.cc Outdated Show resolved Hide resolved

fknorr and others added 3 commits September 13, 2023 20:18

Create correct 0-dimensional chunks for fence commands

ad60fdf

Checked range/id casts, rename range/id_min/max

db8f554

Re-implement grid data structures with normalized regions

0a2e037

fknorr force-pushed the grid-from-scratch branch from a0d550c to 9a5ebcf Compare September 13, 2023 21:20

github-actions bot reviewed Sep 13, 2023

View reviewed changes

test/integration/backend.cc Outdated Show resolved Hide resolved

fknorr added 2 commits September 14, 2023 09:09

Port runtime to new grid implementation

2ff28d5

Remove inclusion of old grid implementation

e2560bb

fknorr and others added 11 commits September 14, 2023 09:09

Remove Allscale dependency

e3dc450

Move grid benchmarks to benchmark executable

9e1f7df

Use small_vector for storage of region boxes

d49a6d5

Re-format code after grid re-implementation (CI formatting check seem…

e834361

…s broken)

Simplify one usage of regions in buffer_manager to boxes

f47d192

Address reviwer comments on grid

354a4b7

Rename grid::get_min_dimensions() to get_effective_dims(), move to de…

9655940

…tail:: for public types

Get rid of unnecessary "remove overlap" step in region normalization

0ca4458

Add comments on potential future region algorithm optimizations

2e67e44

fmt format subranges as "[offset] + [range]" instead of "[min] - [max]"

17feadb

Update benchmark resuts for grid re-implementation

c6ce3e6

fknorr force-pushed the grid-from-scratch branch from 9a5ebcf to c6ce3e6 Compare September 14, 2023 07:23

celerity deleted a comment from github-actions bot Sep 14, 2023

Rename first+last iterator pairs to begin+end

85a98a8

fknorr requested review from PeterTh and psalz September 14, 2023 13:58

psalz approved these changes Sep 14, 2023

View reviewed changes

PeterTh approved these changes Sep 15, 2023

View reviewed changes

fknorr merged commit 3d7c59c into master Sep 15, 2023

fknorr deleted the grid-from-scratch branch September 15, 2023 08:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-implement box and region types #204

Re-implement box and region types #204

fknorr commented Aug 15, 2023 •

edited

Loading

github-actions bot left a comment

github-actions bot commented Aug 15, 2023 •

edited by fknorr

Loading

PeterTh left a comment

github-actions bot commented Aug 25, 2023 •

edited by fknorr

Loading

fknorr commented Aug 25, 2023

PeterTh commented Aug 28, 2023

psalz left a comment

psalz left a comment

github-actions bot left a comment

github-actions bot commented Sep 14, 2023

fknorr commented Sep 14, 2023

PeterTh commented Sep 14, 2023

psalz left a comment

Re-implement box and region types #204

Re-implement box and region types #204

Conversation

fknorr commented Aug 15, 2023 • edited Loading

Region Normalization

Effective Dimensionality

Grid Microbenchmarks

Future Work

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 15, 2023 • edited by fknorr Loading

Benchmark results with std::vector regions

PeterTh left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 25, 2023 • edited by fknorr Loading

Benchmark results with gch::small_vector regions

fknorr commented Aug 25, 2023

PeterTh commented Aug 28, 2023

psalz left a comment

Choose a reason for hiding this comment

psalz left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 14, 2023

fknorr commented Sep 14, 2023

PeterTh commented Sep 14, 2023

psalz left a comment

Choose a reason for hiding this comment

fknorr commented Aug 15, 2023 •

edited

Loading

github-actions bot commented Aug 15, 2023 •

edited by fknorr

Loading

github-actions bot commented Aug 25, 2023 •

edited by fknorr

Loading