-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-implement box and region types #204
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
Benchmark results with std::vector regionsCheck-perf-impact results: (ef93035622456aad56ecc5a640eb0c87)
Overall relative execution time: 0.93x (mean of relative medians) |
7a1f0dd
to
0d17c4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, with a few change requests/comments.
As previously discussed offline, I am a bit concerned about the increase in the number of benchmark cases, since the current infrastructure has no way concepts of groups, or the importance of individual tests. But this is something which can be fixed independently in a future PR.
Benchmark results with gch::small_vector regionsCheck-perf-impact results: (734d1429560e5faf06afaacb00f6674a)
Overall relative execution time: 0.91x (mean of relative medians) |
Regions appear to be marginally faster when using |
1559539
to
c24bdf3
Compare
This makes sense to me, even with mimalloc no allocation is better than an allocation. In that case, I'd go with that storage type, it's not like it changes anything about the interface or usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! That's an impressive level of optimization. I cannot say that I've groked all the algorithms yet, will have to take another look on Monday.
2f190f4
to
1ec1219
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more notes!
a0d550c
to
9a5ebcf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
…tail:: for public types
9a5ebcf
to
c6ce3e6
Compare
Check-perf-impact results: (5a19ced85f862a00d0114dd241122462)
Overall relative execution time: 0.93x (mean of relative medians) |
The huge "slowdown" in the independent-task-generation benchmark looks spurious to me. |
Probably. You'd think a test where a run takes ~10ms wouldn't have such big variance, but the previous run already had huge error bars. A bit surprising. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
What started as academic curiosity turned into a classic month-long summertime deep-dive into a topic that is not even close to the top of our priority list. Oh well.
Currently, Celerity uses AllScale's
GridBox
andGridRegion
types to describe point sets that originate from boxes or tilings of boxes. Out of these,GridRegion
is particularly interesting: Internally avector<GridBox>
, it ensures that the tiling does not contain overlaps, and also opportunistically merges adjacent boxes when computing unions.On the path to this PR, I set out on the following mission:
int
dimensions parameter, unsigned coordinate types and support for 0-dimensional instances.The user-facing API is mostly self-explanatory and similar to the old one, but the inner workings require some explanation. I have made an effort to document all algorithmic decisions as well as possible.
Finally, this PR is able to remove the entirety of AllScale from our dependencies (some 20k LOC) 🥳
Region Normalization
The key innovation in the new region implementation is normalization. While regions are still represented as vectors of boxes, the new region type normalizes the tiling such that any two regions that cover the same points have identical box sequences. Normalization of an arbitrary user-provided box set works as follows:
The resulting representation is unique for the covered set of points and maximizes the extent along the last ("fastest") dimension while minimizing the number of total boxes. A small example in 2D:
All region set operations (union, intersection, difference) work on these normalized regions and take the appropriate step to ensure their result remains normalized as well.
Effective Dimensionality
Like we saw in the implementation of the new region map, treating every region as three-dimensional for the purpose of "dimensionality erasure" can be costly. This is why this PR introduces the concept of effective dimensionality:
Practically this means that the result of a
box_cast<3>(box<1>)
has storage dimensionality 3 but retains effective dimensionality 1, and it can bebox_cast
back to a 1-dimensional box without data loss.The efficiency of region algorithms profits from knowledge about the data's effective dimensionality. Instead of true dimensionality erasure through a pimpl or similar, the region algorithms will detect the effective dimensionality on the data itself and dispatch to the correct, optimal (templated) implementation.
To improve robustness, this PR turns
{range,id,chunk,subrange,box,region}_cast
into checked casts, meaning that a debug build will assert that no type can be cast to a storage dimensionality below its effective dimensionality. Multiple tests were using thecast
functions to create lower-dimensionality ranges in generic tests, for these, I have introducedtest_utils::truncate_{range,id,...}
as a replacement.Grid Microbenchmarks
For development, I have implemented a bunch of benchmarks on both old and new implementations (for the old impl, benchmark code is added in
Re-implement grid data structures with normalized regions
and removed again inRemove inclusion of old grid implementation
).The new impl appears to be about 4x faster on average across a wide variety of scenarios, but there are also individual instances where it is >5x slower. Bear in mind that this is not an apples-to-apples comparison because the resulting tilings differ between old and new impl.
Future Work
region_map
implementationregion_map
determine the impl dimensionality from itsrange::get_min_dimensions
region_map::get_region_values
region_map
can be made to operate in the same normalized representation to guarantee determinism