v0.5.0 - Enchanting Elderberry
Right on time for the holidays we bring you a new major release with several new features, quality of life improvements and debugging facilities.
Thanks to everybody who contributed to this release: @fknorr, @GagaLP, @PeterTh, @psalz!
HIGHLIGHTS
- The
distr_queue::fence
andbuffer_snapshot
APIs introduced in Celerity 0.4.0 are now stable (#225). - It some situations it may be necessary to prevent kernels from being split in a certain way (for example to prevent overlapping writes); this can now be achieved using the new
experimental::constrain_split
API (#212). - Speaking of splits, the new
experimental:hint
API can be used to control how a kernel is split across worker nodes (#227). - Celerity now warns at runtime when a task declares reads from uninitialized buffers or writes with overlapping ranges between nodes (#224).
- The accessor out-of-bounds detection first introduced in Celerity 0.4.0 now also supports host tasks (#211).
Changelog
We recommend using the following SYCL versions with this release:
- DPC++: 61e51015 or newer
- hipSYCL: d2bd9fc7 or newer
Added
- Add new environment variable
CELERITY_PRINT_GRAPHS
to control whether task and command graphs are logged (#197, #236) - Introduce new experimental
for_each_item
utility to iterate over a celerity range (#199) - Add new environment variables
CELERITY_HORIZON_STEP
andCELERITY_HORIZON_MAX_PARALLELISM
to control Horizon generation (#199) - Add support for out-of-bounds checking for host accessors (also enabled via
CELERITY_ACCESSOR_BOUNDARY_CHECK
) (#211) - Add new
debug::set_task_name
utility for naming tasks to aid debugging (#213) - Add new
experimental::constrain_split
API to limit how a kernel can be split (#212) - Add GDB pretty-printers for common Celerity types (#207)
distr_queue::fence
andbuffer_snapshot
are now stable, subsuming theexperimental::
APIs of the same name (#225)- Celerity now warns at runtime when a task declares reads from uninitialized buffers or writes with overlapping ranges between nodes (#224)
- Introduce new
experimental::hint
API for providing the runtime with additional information on how to execute a task (#227) - Introduce new
experimental::hints::split_1d
andexperimental::hints::split_2d
task hints for controlling how a task is split into chunks (#227)
Changed
- Horizons can now also be triggered by graph breadth. This improves performance in some scenarios, and prevents programs with many independent tasks from running out of task queue space (#199)
Fixed
- In edge cases, command graph generation would fail to generate await-push commands when re-distributing reduction results (#223)
- Command graph generation was missing an anti-dependency between push-commands of partial reduction results and the final reduction command (#223)
- Don't create multiple smaller push-commands instead of a single large one in some rare situations (#229)
- Unit tests that inspect logs contained a race that would cause spurious failures (#234)