Skip to content

Commit

Permalink
Update documentation after multi-pass removal
Browse files Browse the repository at this point in the history
  • Loading branch information
psalz committed Apr 17, 2023
1 parent 931338c commit 85ed4f2
Show file tree
Hide file tree
Showing 5 changed files with 31 additions and 35 deletions.
10 changes: 5 additions & 5 deletions docs/host-tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ the `*_host_task` selector must be used for selecting the access mode.
```cpp
celerity::distr_queue q;
celerity::buffer<float, 1> result;
q.submit([=](celerity::handler &cgh) {
q.submit([&](celerity::handler &cgh) {
celerity::accessor acc{buffer, cgh, celerity::access::all{},
celerity::read_only_host_task};
cgh.host_task(celerity::on_master_node, [=]{
Expand All @@ -54,7 +54,7 @@ this node receives:
```cpp
celerity::distr_queue q;
q.submit([=](celerity::handler &cgh) {
q.submit([&](celerity::handler &cgh) {
cgh.host_task(celerity::range<1>(100), [=](celerity::partition<1> part) {
printf("This node received %zu items\n", part.get_subrange().range[0]);
});
Expand Down Expand Up @@ -116,10 +116,10 @@ operations eligible to be run concurrently, Celerity can be notified of this by
celerity::distr_queue q;
celerity::experimental::collective_group first_group;
celerity::experimental::collective_group second_group;
q.submit([=](celerity::handler &cgh) {
q.submit([&](celerity::handler &cgh) {
cgh.host_task(celerity::experimental::collective(first_group), []...);
});
q.submit([=](celerity::handler &cgh) {
q.submit([&](celerity::handler &cgh) {
cgh.host_task(celerity::experimental::collective(second_group), []...);
});
```
Expand All @@ -142,7 +142,7 @@ splitting them along the first (slowest) dimension into contiguous memory portio
```cpp
celerity::distr_queue q;
celerity::buffer<float, 2> buf;
q.submit([=](celerity::handler& cgh) {
q.submit([&](celerity::handler& cgh) {
celerity::accessor acc{buffer, cgh,
celerity::experimental::access::even_split<2>{},
celerity::read_only_host_task};
Expand Down
48 changes: 22 additions & 26 deletions docs/pitfalls.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,34 +26,30 @@ if(rand() > 1337) {

Celerity tasks submitted to the `celerity::distr_queue` are executed
_asynchronously_ at a later point in time. This means that the stack
surrounding a Celerity command group function might
have been unwound by the time it is being called.
surrounding a command function ("kernel") may have been unwound by the time it
is being invoked.

For this reason Celerity by default enforces that tasks only capture
surrounding variables by value, rather than by reference. If you know what
you are doing and would like to disable this check, you can use the
`distr_queue::submit` overload accepting reference captures:
While Celerity and the underlying SYCL implementation will try to detect and
prevent certain types of common errors (for example capturing accessors by
reference), not all mistakes can be caught reliably.

```cpp
celerity::distr_queue q;
bool flag = false;
q.submit(celerity::allow_by_ref, [&flag](celerity::handler &cgh) {...});
```
Also, similar to SYCL kernels, both host and device kernels must not capture accessors
or other values from the command group function by reference since the kernels will
outlive it. To avoid mistakes, make it a habit to **never use by-reference capture
defaults**, even when using `allow_by_ref`:
In particular when using [host tasks](host-tasks.md), it is important to ensure
that all values that are captured by reference outlive the task:

```cpp
celerity::distr_queue q;
celerity::buffer<int, 1> buf;
bool flag = false;
q.submit(celerity::allow_by_ref, [=, &flag](celerity::handler &cgh) {
celerity::accessor acc{buffer, cgh, celerity::access::all{},
celerity::read_only_host_task};
cgh.host_task(celerity::on_master_node, [=, &flag] {
flag = acc[0] == 42;
int global_variable = 22;

void some_function(celerity::distr_queue& q) {
int local_variable = 42;
q.submit([&](celerity::handler& cgh) {
cgh.host_task([&] {
printf("%d\n", global_variable); // safe, global variable outlives task
printf("%d\n", local_variable); // dangling reference!
});
});
});
```
}
```
> Celerity supports experimental APIs that can replace most if not all uses for reference captures.
> See `celerity::experimental::host_object`, `celerity::experimental::side_effect` and
> `celerity::experimental::fence`.
2 changes: 1 addition & 1 deletion docs/range-mappers.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ following command group specifies two different range mappers (whose
definition is omitted) for buffers `buf_a` and `buf_b`:
```cpp
queue.submit([=](celerity::handler& cgh) {
queue.submit([&](celerity::handler& cgh) {
celerity::accessor r_a{cgh, buf_a, my_mapper, celerity::read_only};
celerity::accessor dw_b{cgh, buf_b, other_mapper, celerity::write_only, celerity::no_init};
Expand Down
2 changes: 1 addition & 1 deletion docs/reductions.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The following distributed program computes the sum from 0 to 999 in `sum_buf` us
```c++
celerity::distr_queue q;
celerity::buffer<size_t, 1> sum_buf{{1}};
q.submit([=](celerity::handler& cgh) {
q.submit([&](celerity::handler& cgh) {
auto rd = celerity::reduction(sum_buf, cgh, sycl::plus<size_t>{},
celerity::property::reduction::initialize_to_identity{});
cgh.parallel_for(celerity::range<1>{1000}, rd,
Expand Down
4 changes: 2 additions & 2 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ Now we are ready to do the actual edge detection. For this we will write a
are specified in Celerity is very similar to how it is done in SYCL:

```cpp
queue.submit([=](celerity::handler& cgh) {
queue.submit([&](celerity::handler& cgh) {
// TODO: Buffer accessors
cgh.parallel_for<class MyEdgeDetectionKernel>(
celerity::range<2>(img_height - 2, img_width - 2),
Expand Down Expand Up @@ -252,7 +252,7 @@ handler by calling `celerity::handler::host_task`. Add the following code at the
your `main()` function:

```cpp
queue.submit([=](celerity::handler& cgh) {
queue.submit([&](celerity::handler& cgh) {
celerity::accessor out{edge_buf, cgh, celerity::access::all{}, celerity::read_only_host_task};
cgh.host_task(celerity::on_master_node, [=]() {
stbi_write_png("result.png", img_width, img_height, 1, out.get_pointer(), 0);
Expand Down

0 comments on commit 85ed4f2

Please sign in to comment.