Update Infer proto docs to mention BFloat16 type (kubeflow#2159)

Signed-off-by: Ryan McCormick <rmccormick@nvidia.com>
magdalenakuhn17 · Apr 28, 2022 · 3dc5bfb · 3dc5bfb
1 parent 81d81b3
commit 3dc5bfb
Show file tree

Hide file tree

Showing 2 changed files with 12 additions and 15 deletions.
diff --git a/docs/predict-api/v2/grpc_predict_v2.proto b/docs/predict-api/v2/grpc_predict_v2.proto
@@ -194,9 +194,8 @@ message ModelInferRequest
   // what is expected by the tensor's shape and data type. The raw
   // data must be the flattened, one-dimensional, row-major order of
   // the tensor elements without any stride or padding between the
-  // elements. Note that the FP16 data type must be represented as raw
-  // content as there is no specific data type for a 16-bit float
-  // type.
+  // elements. Note that the FP16 and BF16 data types must be represented as
+  // raw content as there is no specific data type for a 16-bit float type.
   //
   // If this field is specified then InferInputTensor::contents must
   // not be specified for any input tensor.
@@ -249,9 +248,8 @@ message ModelInferResponse
   // what is expected by the tensor's shape and data type. The raw
   // data must be the flattened, one-dimensional, row-major order of
   // the tensor elements without any stride or padding between the
-  // elements. Note that the FP16 data type must be represented as raw
-  // content as there is no specific data type for a 16-bit float
-  // type.
+  // elements. Note that the FP16 and BF16 data types must be represented as
+  // raw content as there is no specific data type for a 16-bit float type.
   //
   // If this field is specified then InferOutputTensor::contents must
   // not be specified for any output tensor.

diff --git a/docs/predict-api/v2/required_api.md b/docs/predict-api/v2/required_api.md
@@ -421,9 +421,9 @@ Tensor data given explicitly is provided in a JSON array. Each element
 of the array may be an integer, floating-point number, string or
 boolean value. The server can decide to coerce each element to the
 required type or return an error if an unexpected value is
-received. Note that fp16 is problematic to communicate explicitly
-since there is not a standard fp16 representation across backends nor
-typically the programmatic support to create the fp16 representation
+received. Note that fp16 and bf16 are problematic to communicate explicitly
+since there is not a standard fp16/bf16 representation across backends nor
+typically the programmatic support to create the fp16/bf16 representation
 for a JSON number.
 
 For example, the 2-dimensional matrix:
@@ -667,9 +667,8 @@ failure. The request and response messages for ModelInfer are:
       // what is expected by the tensor's shape and data type. The raw
       // data must be the flattened, one-dimensional, row-major order of
       // the tensor elements without any stride or padding between the
-      // elements. Note that the FP16 data type must be represented as raw
-      // content as there is no specific data type for a 16-bit float
-      // type.
+      // elements. Note that the FP16 and BF16 data types must be represented as
+      // raw content as there is no specific data type for a 16-bit float type.
       //
       // If this field is specified then InferInputTensor::contents must
       // not be specified for any input tensor.
@@ -722,9 +721,8 @@ failure. The request and response messages for ModelInfer are:
       // what is expected by the tensor's shape and data type. The raw
       // data must be the flattened, one-dimensional, row-major order of
       // the tensor elements without any stride or padding between the
-      // elements. Note that the FP16 data type must be represented as raw
-      // content as there is no specific data type for a 16-bit float
-      // type.
+      // elements. Note that the FP16 and BF16 data types must be represented as
+      // raw content as there is no specific data type for a 16-bit float type.
       //
       // If this field is specified then InferOutputTensor::contents must
       // not be specified for any output tensor.
@@ -868,3 +866,4 @@ of each type, in bytes.
 | FP32      | 4            |
 | FP64      | 8            |
 | BYTES     | Variable (max 2<sup>32</sup>) |
+| BF16      | 2            |