diff --git a/docs/predict-api/v2/grpc_predict_v2.proto b/docs/predict-api/v2/grpc_predict_v2.proto index 8d54bf1a1c1..6b2475a2e8e 100644 --- a/docs/predict-api/v2/grpc_predict_v2.proto +++ b/docs/predict-api/v2/grpc_predict_v2.proto @@ -194,9 +194,8 @@ message ModelInferRequest // what is expected by the tensor's shape and data type. The raw // data must be the flattened, one-dimensional, row-major order of // the tensor elements without any stride or padding between the - // elements. Note that the FP16 data type must be represented as raw - // content as there is no specific data type for a 16-bit float - // type. + // elements. Note that the FP16 and BF16 data types must be represented as + // raw content as there is no specific data type for a 16-bit float type. // // If this field is specified then InferInputTensor::contents must // not be specified for any input tensor. @@ -249,9 +248,8 @@ message ModelInferResponse // what is expected by the tensor's shape and data type. The raw // data must be the flattened, one-dimensional, row-major order of // the tensor elements without any stride or padding between the - // elements. Note that the FP16 data type must be represented as raw - // content as there is no specific data type for a 16-bit float - // type. + // elements. Note that the FP16 and BF16 data types must be represented as + // raw content as there is no specific data type for a 16-bit float type. // // If this field is specified then InferOutputTensor::contents must // not be specified for any output tensor. diff --git a/docs/predict-api/v2/required_api.md b/docs/predict-api/v2/required_api.md index 93b8e18500a..56e90e1468e 100644 --- a/docs/predict-api/v2/required_api.md +++ b/docs/predict-api/v2/required_api.md @@ -421,9 +421,9 @@ Tensor data given explicitly is provided in a JSON array. Each element of the array may be an integer, floating-point number, string or boolean value. The server can decide to coerce each element to the required type or return an error if an unexpected value is -received. Note that fp16 is problematic to communicate explicitly -since there is not a standard fp16 representation across backends nor -typically the programmatic support to create the fp16 representation +received. Note that fp16 and bf16 are problematic to communicate explicitly +since there is not a standard fp16/bf16 representation across backends nor +typically the programmatic support to create the fp16/bf16 representation for a JSON number. For example, the 2-dimensional matrix: @@ -667,9 +667,8 @@ failure. The request and response messages for ModelInfer are: // what is expected by the tensor's shape and data type. The raw // data must be the flattened, one-dimensional, row-major order of // the tensor elements without any stride or padding between the - // elements. Note that the FP16 data type must be represented as raw - // content as there is no specific data type for a 16-bit float - // type. + // elements. Note that the FP16 and BF16 data types must be represented as + // raw content as there is no specific data type for a 16-bit float type. // // If this field is specified then InferInputTensor::contents must // not be specified for any input tensor. @@ -722,9 +721,8 @@ failure. The request and response messages for ModelInfer are: // what is expected by the tensor's shape and data type. The raw // data must be the flattened, one-dimensional, row-major order of // the tensor elements without any stride or padding between the - // elements. Note that the FP16 data type must be represented as raw - // content as there is no specific data type for a 16-bit float - // type. + // elements. Note that the FP16 and BF16 data types must be represented as + // raw content as there is no specific data type for a 16-bit float type. // // If this field is specified then InferOutputTensor::contents must // not be specified for any output tensor. @@ -868,3 +866,4 @@ of each type, in bytes. | FP32 | 4 | | FP64 | 8 | | BYTES | Variable (max 232) | +| BF16 | 2 |