This release of Intel® Optimized TensorFlow is based on the TensorFlow v2.5.0 tag and is built with support for oneDNN (oneAPI Deep Neural Network Library). For features and fixes that were introduced in TensorFlow 2.5.0, please see the TensorFlow 2.5.0 release notes also. This build was built from v2.5.0.

Major features and improvements

oneAPI Deep Neural Network Library (oneDNN) optimizations from Intel-optimized TensorFlow are also now available in the x86-64 Linux official TensorFlow builds.
- Pip install tensorflow
- Enable CPU optimizations by setting the environment variable TF_ENABLE_ONEDNN_OPTS=1
- Disable CPU optimizations by setting the environment variable TF_ENABLE_ONEDNN_OPTS=0
For more details and performance data please refer to the blog:Leverage Intel Deep Learning Optimizations in TensorFlow
There are a few differences with Intel Optimized TensorFlow.
- Only native layout format is supported (The environment variable TF_ENABLE_MKL_NATIVE_FORMAT will not have any effect)
- The oneDNN optimizations in official TensorFlow will not include int8 quantization (it will still be available in Intel Optimized TensorFlow). It will be available in later versions of official TensorFlow.
- The OpenMP is not available with oneDNN optimization in official TensorFlow (will continue to be available in Intel Optimized TensorFlow)
Support for Python3.9 has been added.
Upgraded oneDNN to v2.2

Breaking changes

The default for Intel TF is now native format, The user will need to set the env-variable TF_ENABLE_MKL_NATIVE_FORMAT=0 to use blocked formats.
Int8 will only work when environment variable TF_ENABLE_MKL_NATIVE_FORMAT=0 is set.

Other changes

OneDNN primitive cache enabled. Improved performance of models with batch size 1.
Various ops fusions with FP32, BFloat16, and INT8 data format
- Conv2D+Squeeze+BiasAdd Fusion
- MatMul+BiasAdd+Add Fusion.
- Enabled MatMul + Bias + LeakyRelu Fusion.
CNMS performance optimization
Enabled DNNL CPU dispatch control.
Graph pattern match for grappler op fusion optimization
Supporting quantized pooling op for signed 8 bits.
Enable MklConv/MklFusedConv with explicit padding
Remove nGraph build support tensorflow#42870
Execute small gemm's single threaded.
Removed unnecessary OneDNN dependencies.
Removed DNNL 0.x support

Bug fixes

Issues resolved in TensorFlow 2.5
oneDNN resolved issues. 2.2 resolved issues
Fixed memory leak in MKLAddN
Fixed the bug to duplicate kernel registration of BatchMatMulV2.
Fixed unit test failures due to benchmark test API changes
incorrect result of _MKLMaxPoolGrad 40122

Versions and components

Intel optimized TensorFlow based on TensorFlow v2.5.0: https://github.com/Intel-tensorflow/tensorflow/tree/v2.5.0
Intel® Optimization for TensorFlow* Installation Guide
TensorFlow v2.5.0: https://github.com/tensorflow/tensorflow/tree/v2.5.0
oneDNN: https://github.com/oneapi-src/oneDNN/releases/tag/v2.2
Model Zoo: https://github.com/IntelAI/models

Known issues

OMP Threads are created with variable number when we compile with XLA flag ON. 40836
3 unit test failures with mkl block format in 2.5 branch
- //tensorflow/python/debug:analyzer_cli_test
- //tensorflow/python/eager:backprop_test
- //tensorflow/python/kernel_tests:relu_op_test
bfloat16 is not guaranteed to work on AVX or AVX2.
Open issues: open issues for oneDNN optimizations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intel® Optimizations for TensorFlow 2.5.0