Skip to content

Intel® Optimizations for TensorFlow 2.5.0

Compare
Choose a tag to compare
@rsketine rsketine released this 24 May 22:49
· 63429 commits to master since this release
bcda61c

This release of Intel® Optimized TensorFlow is based on the TensorFlow v2.5.0 tag and is built with support for oneDNN (oneAPI Deep Neural Network Library). For features and fixes that were introduced in TensorFlow 2.5.0, please see the TensorFlow 2.5.0 release notes also. This build was built from v2.5.0.

Major features and improvements

  • oneAPI Deep Neural Network Library (oneDNN) optimizations from Intel-optimized TensorFlow are also now available in the x86-64 Linux official TensorFlow builds.
    • Pip install tensorflow
    • Enable CPU optimizations by setting the environment variable TF_ENABLE_ONEDNN_OPTS=1
    • Disable CPU optimizations by setting the environment variable TF_ENABLE_ONEDNN_OPTS=0
  • For more details and performance data please refer to the blog:Leverage Intel Deep Learning Optimizations in TensorFlow
  • There are a few differences with Intel Optimized TensorFlow.
    • Only native layout format is supported (The environment variable TF_ENABLE_MKL_NATIVE_FORMAT will not have any effect)
    • The oneDNN optimizations in official TensorFlow will not include int8 quantization (it will still be available in Intel Optimized TensorFlow). It will be available in later versions of official TensorFlow.
    • The OpenMP is not available with oneDNN optimization in official TensorFlow (will continue to be available in Intel Optimized TensorFlow)
  • Support for Python3.9 has been added.
  • Upgraded oneDNN to v2.2

Breaking changes

  • The default for Intel TF is now native format, The user will need to set the env-variable TF_ENABLE_MKL_NATIVE_FORMAT=0 to use blocked formats.
  • Int8 will only work when environment variable TF_ENABLE_MKL_NATIVE_FORMAT=0 is set.

Other changes

  • OneDNN primitive cache enabled. Improved performance of models with batch size 1.
  • Various ops fusions with FP32, BFloat16, and INT8 data format
    • Conv2D+Squeeze+BiasAdd Fusion
    • MatMul+BiasAdd+Add Fusion.
    • Enabled MatMul + Bias + LeakyRelu Fusion.
  • CNMS performance optimization
  • Enabled DNNL CPU dispatch control.
  • Graph pattern match for grappler op fusion optimization
  • Supporting quantized pooling op for signed 8 bits.
  • Enable MklConv/MklFusedConv with explicit padding
  • Remove nGraph build support tensorflow#42870
  • Execute small gemm's single threaded.
  • Removed unnecessary OneDNN dependencies.
  • Removed DNNL 0.x support

Bug fixes

  • Issues resolved in TensorFlow 2.5
  • oneDNN resolved issues. 2.2 resolved issues
  • Fixed memory leak in MKLAddN
  • Fixed the bug to duplicate kernel registration of BatchMatMulV2.
  • Fixed unit test failures due to benchmark test API changes
  • incorrect result of _MKLMaxPoolGrad 40122

Versions and components

Known issues

  • OMP Threads are created with variable number when we compile with XLA flag ON. 40836
  • 3 unit test failures with mkl block format in 2.5 branch
    • //tensorflow/python/debug:analyzer_cli_test
    • //tensorflow/python/eager:backprop_test
    • //tensorflow/python/kernel_tests:relu_op_test
  • bfloat16 is not guaranteed to work on AVX or AVX2.
  • Open issues: open issues for oneDNN optimizations