Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Adding graph support for enqueue function submit ... #15385

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -709,6 +709,32 @@ optimize such partial barriers.
_{endnote}_]
|====

==== Command Graph

The functions in this section are only available if the
link:./sycl_ext_oneapi_graph.asciidoc[
sycl_ext_oneapi_graph] extension is supported.

|====
a|
[frame=all,grid=none]
!====
a!
[source,c++]
----
namespace sycl::ext::oneapi::experimental {

template <typename CommandGroupFunc>
void submit(command_graph<graph_state::modifiable> g, CommandGroupFunc&& cgf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to correspond to command_graph::add? That's the API you use to build a graph in explicit mode. If that is the case, then the name of this API seems wrong. No commands are actually being submitted with this API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this maps to add. Naming is a compromise again between seamless integration into enqueue function extension and its current SYCL-like naming scheme: submit*.

Another option could be something like add_in_order or add_linear, indicating the implicit creation of edges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this API implicitly create a linear graph? The corresponding function submit(sycl::queue q) doesn't imply a linear set of commands. Why should it be different when building a graph?

Regarding naming, how bout add_graph_node (assuming it does not have the linear semantic)?


}
----
!====
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've not defined any exceptions here, but I presume this would inherit the exceptions from the explicit graph.add() API. We should try link to that wording here.

_Effects_: Submits a command-group function object (as defined by the SYCL
specification) to the `command_graph` for deferred execution. Subsequent calls
will create a linear chain of nodes with implicit edges.
Comment on lines +734 to +735
Copy link
Contributor

@EwanC EwanC Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about these linear semantics, situations I can think of where this won't happen are

  1. if a buffer accessor is used in the command-group that could create edges to nodes to other predecessor.
  2. using handler::depends_on() in the command-group can create an edge.
  3. Using the explicit API that to create a node with more than one predecessor inbetween two calls to this API

|====


== Issues

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

#include <sycl/detail/common.hpp>
#include <sycl/event.hpp>
#include <sycl/ext/oneapi/experimental/graph.hpp>
#include <sycl/ext/oneapi/properties/properties.hpp>
#include <sycl/handler.hpp>
#include <sycl/nd_range.hpp>
Expand Down Expand Up @@ -103,6 +104,12 @@ void submit(queue Q, CommandGroupFunc &&CGF,
Q, std::forward<CommandGroupFunc>(CGF), CodeLoc);
}

template <typename CommandGroupFunc>
void submit(command_graph<graph_state::modifiable> G, CommandGroupFunc &&CGF) {
G.add(std::forward<CommandGroupFunc>(CGF),
{property::node::depends_on_all_leaves()});
}

template <typename CommandGroupFunc>
event submit_with_event(queue Q, CommandGroupFunc &&CGF,
const sycl::detail::code_location &CodeLoc =
Expand Down
10 changes: 10 additions & 0 deletions sycl/test-e2e/Graph/Explicit/ext_oneapi_enqueue_functions.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// RUN: %{build} -o %t.out
// RUN: %{run} %t.out
// Extra run to check for leaks in Level Zero using UR_L0_LEAKS_DEBUG
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}
// Extra run to check for immediate-command-list in Level Zero
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/ext_oneapi_enqueue_functions.cpp"
99 changes: 99 additions & 0 deletions sycl/test-e2e/Graph/Inputs/ext_oneapi_enqueue_functions.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
// Tests the enqueue free function kernel shortcuts.

#include "../graph_common.hpp"
#include <sycl/ext/oneapi/experimental/enqueue_functions.hpp>
#include <sycl/properties/all_properties.hpp>

template <typename T>
void run_kernels_usm_in_order(queue Q, const size_t Size, T *DataA, T *DataB,
T *DataC, std::vector<T> &Output, T Pattern) {
exp_ext::fill(Q, DataA, Pattern, Size);

exp_ext::single_task(Q, [=]() {
for (size_t i = 0; i < Size; ++i) {
DataB[i] = i;
}
});

exp_ext::parallel_for(Q, sycl::range<1>{Size}, [=](sycl::item<1> Item) {
DataC[Item] = DataA[Item] * DataB[Item];
});

exp_ext::copy(Q, DataC, Output.data(), Size);
}

template <typename T>
void add_kernels_usm_in_order(
exp_ext::command_graph<exp_ext::graph_state::modifiable> G,
const size_t Size, T *DataA, T *DataB, T *DataC, std::vector<T> &Output,
T Pattern) {
exp_ext::submit(G,
[&](sycl::handler &CGH) { CGH.fill(DataA, Pattern, Size); });

exp_ext::submit(G, [&](sycl::handler &CGH) {
CGH.single_task([=]() {
for (size_t i = 0; i < Size; ++i) {
DataB[i] = i;
}
});
});

exp_ext::submit(G, [&](sycl::handler &CGH) {
CGH.parallel_for(sycl::range<1>{Size}, [=](sycl::item<1> Item) {
DataC[Item] = DataA[Item] * DataB[Item];
});
});

exp_ext::submit(
G, [&](sycl::handler &CGH) { CGH.copy(DataC, Output.data(), Size); });
}

template <typename T>
void add_nodes_in_order(
exp_ext::command_graph<exp_ext::graph_state::modifiable> Graph, queue Queue,
const size_t Size, T *DataA, T *DataB, T *DataC, std::vector<T> &Output,
T Pattern) {
#if defined(GRAPH_E2E_EXPLICIT)
add_kernels_usm_in_order(Graph, Size, DataA, DataB, DataC, Output, Pattern);
#elif defined(GRAPH_E2E_RECORD_REPLAY)
Graph.begin_recording(Queue);
run_kernels_usm_in_order(Queue, Size, DataA, DataB, DataC, Output, Pattern);
Graph.end_recording(Queue);
#else
assert(0 && "Error: Cannot use add_nodes without selecting an API");
#endif
}

int main() {
queue InOrderQueue{property::queue::in_order{}};

using T = int;
T Pattern = 42;

T *PtrA = malloc_device<T>(Size, InOrderQueue);
T *PtrB = malloc_device<T>(Size, InOrderQueue);
T *PtrC = malloc_device<T>(Size, InOrderQueue);

std::vector<T> Output(Size);

exp_ext::command_graph Graph{InOrderQueue};

add_nodes_in_order(Graph, InOrderQueue, Size, PtrA, PtrB, PtrC, Output,
Pattern);

auto GraphExec = Graph.finalize();

InOrderQueue.submit([&](handler &CGH) { CGH.ext_oneapi_graph(GraphExec); });
InOrderQueue.wait_and_throw();

free(PtrA, InOrderQueue);
free(PtrB, InOrderQueue);
free(PtrC, InOrderQueue);

for (size_t i = 0; i < Size; i++) {
T Ref = Pattern * i;
assert(Output[i] == Ref);
}

return 0;
}
Original file line number Diff line number Diff line change
Expand Up @@ -5,55 +5,6 @@
// Extra run to check for immediate-command-list in Level Zero
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}

// Tests the enqueue free function kernel shortcuts.
#define GRAPH_E2E_RECORD_REPLAY

#include "../graph_common.hpp"
#include <sycl/ext/oneapi/experimental/enqueue_functions.hpp>
#include <sycl/properties/all_properties.hpp>

int main() {
queue InOrderQueue{property::queue::in_order{}};

using T = int;

T *PtrA = malloc_device<T>(Size, InOrderQueue);
T *PtrB = malloc_device<T>(Size, InOrderQueue);
T *PtrC = malloc_device<T>(Size, InOrderQueue);

exp_ext::command_graph Graph{InOrderQueue};
Graph.begin_recording(InOrderQueue);

T Pattern = 42;
exp_ext::fill(InOrderQueue, PtrA, Pattern, Size);

exp_ext::single_task(InOrderQueue, [=]() {
for (size_t i = 0; i < Size; ++i) {
PtrB[i] = i;
}
});

exp_ext::parallel_for(
InOrderQueue, sycl::range<1>{Size},
[=](sycl::item<1> Item) { PtrC[Item] = PtrA[Item] * PtrB[Item]; });

std::vector<T> Output(Size);
exp_ext::copy(InOrderQueue, PtrC, Output.data(), Size);

Graph.end_recording();

auto GraphExec = Graph.finalize();

InOrderQueue.submit([&](handler &CGH) { CGH.ext_oneapi_graph(GraphExec); });
InOrderQueue.wait_and_throw();

free(PtrA, InOrderQueue);
free(PtrB, InOrderQueue);
free(PtrC, InOrderQueue);

for (size_t i = 0; i < Size; i++) {
T Ref = Pattern * i;
assert(Output[i] == Ref);
}

return 0;
}
#include "../Inputs/ext_oneapi_enqueue_functions.cpp"
Loading