-
Notifications
You must be signed in to change notification settings - Fork 11
Zerocopy
Current GNU Radio implements a large circular buffer on the output of each port. Because the buffers are circular, boundary conditions are very easy to deal with. This makes the implementation simple and strait forward.
GRAS implements a message-based buffer model, where upstream blocks pass smaller, reference-counted buffers to downstream blocks. This implementation allows for advanced control of buffer allocation, size, affinity, zero-copy workflows, and more.
General speaking, a processing function reads from an input buffer and writes to an output buffer. Under certain circumstances, the input buffer can also become the output buffer in a processing function. This reduces memory bandwidth and takes advantage of CPU cache. While this is possible with the current GNU Radio scheduler, special modifications are needed, and the circumstances under which in-placing is possible are extremely limited due to the shared circular buffer model.
Here is a branch implementing said feature on GNU Radio classic: https://github.com/guruofquality/gnuradio/tree/inplace_blocks
Under GRAS, in-placing is done when an input buffer is unique, in other words, when the reference count is 1, you are the only holder. In-place ports have to be manually enabled, because it will not be safe for the scheduler to assume readability. To understand this issue further, consider the implementation of an adder block with N input ports and 1 output port.
for i in range num_items_in_buffer: out[0][i] = in[0][i] + in[1][i]; out[0][i] += in[2][i];
Input buffer 0 and 1 can be in-placed to become the output buffer because they are read before the output is written. However, this is not the case for input buffer 2. If input buffer 2 became the output buffer, it would be written to before it was read. Only the author of the processing function can understand this and communicate the limitations of the algorithm to the scheduler.
See InputPortConfig::inline_buffer/set_input_config: https://github.com/guruofquality/gras/blob/master/include/gras/block.hpp
A device, hardware, on-board chip, etc... may allocate its own buffers from a memory mapped pool. These buffers may be input or output buffers. To achieve zero copy, we want the scheduler to directly use these buffers 1) as the input to a downstream block and 2) as the output to an upstream block. It is the goal of GRAS to solve both 1) and 2).
With the new buffering model, using the memory mapped buffer as input to a downstream block is very strait forward. However, forcing the output buffer to be filled by an upstream block is more difficult, and requires knowledge of the topology. We hope to implement the upstream case as well when topological conditions are fitting.
See output_buffer_allocator/input_buffer_allocator: https://github.com/guruofquality/gras/blob/master/include/gras/block.hpp
Consider blocks like head, skiphead, delay, throttle, ... some block that only manipulates tags. These blocks do not mutate the buffers in any way. It is possible to just pass buffers downstream and avoid the usual memcpy. This is because when a port or buffer is known to be read-only, we can make even more conservative assumptions about the buffer than the in-place logic. Its always safe to read the buffer and pass all or part of it to a downstream port.
See get_input_buffer/post_output_buffer: https://github.com/guruofquality/gras/blob/master/include/gras/block.hpp