Add doc for batching on NPU plugin

openvinotoolkit · Sep 30, 2024 · 27faacc · 27faacc
1 parent 6b0ccda
commit 27faacc
Showing 1 changed file with 34 additions and 0 deletions.
diff --git a/...ing-inference/inference-devices-and-modes/npu-device/batching-on-npu-plugin.rst b/...ing-inference/inference-devices-and-modes/npu-device/batching-on-npu-plugin.rst
@@ -0,0 +1,34 @@
+NPU Plugin batching
+===============================
+
+
+.. meta::
+   :description: The Bathing is handled on the NPU plugin in OpenVINO™
+                 in two different modes, concurrency-based inferences
+                 or handled by the compiler.
+
+
+The NPU plugin will first check if the following conditions are met:
+* Batch size is on the first axis.
+* All inputs and outputs have the same batch size.
+* Model does not contain states.
+
+In case conditions are met, due to current compiler limitations and ongoing work on performance improvements for batch_size higher than one,
+the NPU plugin will first try to compile and execute the original model with forced batch_size to 1.
+In case this compilation succeeds, the plugin will detect a difference between the original model layout
+and transformed/compiled layout (in batch size) and would:
+- internally construct multiple command lists, one for each input
+- execute each command list for proper offsets of input/output buffers
+- once all command lists are executed, the plugin will notify the user of the completion of the inference request.
+
+This batching mode based on concurrency is transparent to the application. One single inference request will handle all inputs from the batch.
+Performance might be lower compared to regular batching; this mode is intended to offer basic batching functionality on older drivers
+or in case the model cannot yet be compiled with a batch size larger than one.
+
+In case these conditions are not met the NPU plugin will try to compile and execute the original model with the given
+batch_size to N as any other regular model.
+
+Due to current compiler limitation and ongoing work on performance improvements for batch_size!=1
+Note: Once the performance improves and multiple models can compile with a batch size larger than one,
+the default order will be changed; NPU will try first to compile and execute the original model with the given
+batch size and fall back to concurrent batching if compilation fails.