-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Network execution happens when user calls inferRequest->infer()
or inferRequest->start_async()
. (link)
In high level, all we need to do is enqueuing OCL kernels with buffers. For that purpose, we need to find the cldnn::network
instance as it contains the required buffers for execution. (TBD: Link to data structure doc) CPUStreamExecutor
is holding streams and the stream corresponds to the cldnn::network
structure. (link)
The main body of network execution is cldnn::network::execute_impl
. (link) In this function, set_arguments()
is called to set OpenCL arguments and execute_primitive
is called to enqueue kernels to OCL queue.
In case of synchronous API call(i.e. inferRequest->infer()
), waiting for completion of kernels is also required. It is called from cldnn::network_output::get_memory()
function. (link)
This function also contains some logic to dump intermediate buffer for debugging purpose. As it is related to memory usage, it deserves some description, too.
In order to dump intermediate buffer, we need to wait for the moment that the kernel is about to be called(for source buffer) or just called(for destination buffer). In other moments, we don't have the intermediate buffer as the buffers are reused from memory pool. TBD: Link to data structure doc
get_stream().finish()
is called firstly as we need to be synchronous with kernel execution. (link) Then we access the intermediate buffer. (link) This access varies depending on the kind of buffer. If it is usm_host
or usm_shared
, it is just accessed directly. If it is usm_device
, it is accessed after copying the data into host memory because host cannot access usm_device
directly. (link) If it is ocl memory, we map this into host memory. (link) Typical network execution happens with usm_host
for network input and output and usm_device
for the buffers inside the network.