Skip to content

Releases: Intel-tensorflow/tensorflow

Intel® Optimizations for TensorFlow 2.5.0

24 May 22:49
bcda61c
Compare
Choose a tag to compare

This release of Intel® Optimized TensorFlow is based on the TensorFlow v2.5.0 tag and is built with support for oneDNN (oneAPI Deep Neural Network Library). For features and fixes that were introduced in TensorFlow 2.5.0, please see the TensorFlow 2.5.0 release notes also. This build was built from v2.5.0.

Major features and improvements

  • oneAPI Deep Neural Network Library (oneDNN) optimizations from Intel-optimized TensorFlow are also now available in the x86-64 Linux official TensorFlow builds.
    • Pip install tensorflow
    • Enable CPU optimizations by setting the environment variable TF_ENABLE_ONEDNN_OPTS=1
    • Disable CPU optimizations by setting the environment variable TF_ENABLE_ONEDNN_OPTS=0
  • For more details and performance data please refer to the blog:Leverage Intel Deep Learning Optimizations in TensorFlow
  • There are a few differences with Intel Optimized TensorFlow.
    • Only native layout format is supported (The environment variable TF_ENABLE_MKL_NATIVE_FORMAT will not have any effect)
    • The oneDNN optimizations in official TensorFlow will not include int8 quantization (it will still be available in Intel Optimized TensorFlow). It will be available in later versions of official TensorFlow.
    • The OpenMP is not available with oneDNN optimization in official TensorFlow (will continue to be available in Intel Optimized TensorFlow)
  • Support for Python3.9 has been added.
  • Upgraded oneDNN to v2.2

Breaking changes

  • The default for Intel TF is now native format, The user will need to set the env-variable TF_ENABLE_MKL_NATIVE_FORMAT=0 to use blocked formats.
  • Int8 will only work when environment variable TF_ENABLE_MKL_NATIVE_FORMAT=0 is set.

Other changes

  • OneDNN primitive cache enabled. Improved performance of models with batch size 1.
  • Various ops fusions with FP32, BFloat16, and INT8 data format
    • Conv2D+Squeeze+BiasAdd Fusion
    • MatMul+BiasAdd+Add Fusion.
    • Enabled MatMul + Bias + LeakyRelu Fusion.
  • CNMS performance optimization
  • Enabled DNNL CPU dispatch control.
  • Graph pattern match for grappler op fusion optimization
  • Supporting quantized pooling op for signed 8 bits.
  • Enable MklConv/MklFusedConv with explicit padding
  • Remove nGraph build support tensorflow#42870
  • Execute small gemm's single threaded.
  • Removed unnecessary OneDNN dependencies.
  • Removed DNNL 0.x support

Bug fixes

  • Issues resolved in TensorFlow 2.5
  • oneDNN resolved issues. 2.2 resolved issues
  • Fixed memory leak in MKLAddN
  • Fixed the bug to duplicate kernel registration of BatchMatMulV2.
  • Fixed unit test failures due to benchmark test API changes
  • incorrect result of _MKLMaxPoolGrad 40122

Versions and components

Known issues

  • OMP Threads are created with variable number when we compile with XLA flag ON. 40836
  • 3 unit test failures with mkl block format in 2.5 branch
    • //tensorflow/python/debug:analyzer_cli_test
    • //tensorflow/python/eager:backprop_test
    • //tensorflow/python/kernel_tests:relu_op_test
  • bfloat16 is not guaranteed to work on AVX or AVX2.
  • Open issues: open issues for oneDNN optimizations

Intel® Optimizations for TensorFlow* 2.4.0

29 Jan 06:46
Compare
Choose a tag to compare

This release of ® Optimized TensorFlow  is based on the TensorFlow v2.4.0 tag and is built with support for oneDNN (oneAPI Deep Neural Network Library). For features and fixes that were introduced in TensorFlow 2.4.0, please see the TensorFlow 2.4.0 release notes. This build was built from v2.4.0.

New functionality and usability improvements:

  • Disable MKL-DNN 0.x support.

  • Upgrade oneDNN version to v1.6.4.

  • Enable Windows platform support.

  • Enhance and Auto-Mixed-Precision support.

  • Enable Eigen Thread Pool feature with --config=mkl_threadpool build option.

  • Introduced Native Format with oneDNN. You can run benchmarks by enabling TF_ENABLE_MKL_NATIVE_FORMAT=1 during run time. This feature is enabled for datatype currently.

  • Reduced the binary size by 40% for both Windows and Linux.

  • Release AVX512 binary packages and containers to show case out of box data type support and performance for our customers.

Bug fixes:

  • Issues resolved in TensorFlow 2.4

  • oneDNN resolved issues

  • Bugs fixed for MKL build only

    • Fix UT failure in mkl_layout_pass.cc caused by DCHECK

    • Fix TF_DISABLE_MKL not work in remapper.cc

    • Fix a bug in Convolution + Add fusion

    • Fix UT failure control_flow_ops_test

    • Fix for concat v2 unit test failure

    • Fix UT failures due to explicit padding

    • Fix build failure in MklRelu

    • Fix bug in MklMaxPoolGrad

    • Fix bug in a fused operator (conv2d + bias_add + add + relu) about in-place computation with tensor forwarding

    • Fix auto mixed precision bug

  • Upgrade Curl version to 7.74 to fix CVEs.

Versions and components

Known issues

  • //tensorflow/python/keras/distribute:collective_all_reduce_strategy_test is failing in 2.4 branch.

  • //tensorflow/python/keras/distribute:multi_worker_callback_tf2_test is failing with Python 3.6 & 3.7

Intel® Optimizations for TensorFlow* 1.15 UP2 Maintenance Release

29 Dec 16:06
Compare
Choose a tag to compare

Sources
This maintenance release of Intel® Optimizations for TensorFlow* 1.15 UP2 Release is based on the TensorFlow v1.15.0up2 tag (https://github.com/Intel-tensorflow/tensorflow.git) as built with support for oneAPI Deep Neural Network Library (oneDNN v1.6.4). This revision contains the following features and fixes:

New functionality and usability improvements:
• Support oneDNN version 1.6.4 and integration work with TensorFlow.
• Support Eigen threadpool feature.
• oneDNN v0.x cleanup - Reset MKL build config and remove DNN v0.x related macros.
• Adding MKL-DNNL ops supporting threadpool.
• Changes to oneDNN build to remove binary blob when building with opensource components only.
• Port optimize_for_inference.py.
• Add matmul + biasadd + add fusion.
• Add mkl gelu kernels.
• Add eigen gelu.
• Add pattern matcher and gelu fusion.
• Support oneDNN binary op.
• BatchMatMul+Mul fusion.
• Eigen AVX/AVX2 vectorization .
• Disable _MklFusedBatchNorm op when input is 5D tensor.
• Enable oneDNNL BatchMatMul support with broadcast and update oneDNN to v1.6.4.
• Add MKL Conv + Bias + LeakyRelu Fusion.
• Added missed CPU support for op math.rsqrt.
• MatMul + Tanh fusion.
• Enable Conv + Biasadd + LeakyRelu Fusion with Eigen implementation in CPU.
• exp.
• Add auto_mixed_precision_mkl to run-once optimizer list.
• Update build name with new branding.
• Changes to oneDNN build to remove binary blob when building with opensource components only.
• Removing OpenMP dependency from oneDNN supporting threadpool
• Boiler plate code for oneDNN threadpool and mkl_conv_ops.cc

Bug fixes:
• Fixing a bug in Convolution+Add fusion.
• Bug-fix to in-place computation with tensor forwarding.
• oneDNN build fix.
• Fix DCHECK problem in mkl_layout_pass.cc.
• Fix TF_DISABLE_MKL not work in remapper.cc.
• Minor bug fixes for threadpool unit tests.
• Fixed auto mixed precision bug that ops have more than one attr_type(like FusedBatchNorm ).
• Pooling ops build fix.

Additional security and performance patches:
• Remove the grpc dependency.

Known issues:
• Only left one UT failure (//tensorflow/python/kernel_tests:relu_op_test) which also exists but disabled in TensorFlow master branch.

Best known methods:
• Gelu API:
If model uses gelu op, suggest to use new API ‘tf.nn.gelu’ instead of small operations in python model code. An example is below.
https://github.com/IntelAI/models/blob/master/models/language_modeling/tensorflow/bert_large/inference/generic_ops.py#L88-L106
• Freeze graph
Freeze graph is an important step to improve inference performance. But the steps vary from model to model. A freeze graph script of BERT base inference classifier is provided as reference: https://github.com/IntelAI/models/blob/master/models/language_modeling/tensorflow/bert_large/inference/export_classifier.py

Intel® Optimizations for TensorFlow* 1.15.0 UP1

29 Aug 16:00
Compare
Choose a tag to compare

This maintenance release of Intel® Optimizations for TensorFlow* 1.15 UP1 is based on the TensorFlow v1.15.0up1 tag (https://github.com//tensorflow.git) as built with support for oneAPI Deep Neural Network Library (oneDNN). This revision contains the following features and fixes:

New functionality and usability improvements:

• Support for oneDNN version 1.4.0 and integration work with Tensorflow.
• Optimized Bfloat16 data type for MKL backend.
• Add Eigen Bfloat16 vectorization for better performance.
• Compatibility with official TensorFlow 1.15 release unit test pass rate.
• Add comparison and cast op fusion.
• Add Pad+Conv fusion for bf16.
• Replace tensorflow::bfloat16 with Eigen::bfloat16.
• Add MKL support to auto_mixed_precision.
• Adding MklTanh op.
• Threadpool changes for pooling ops.
• Threadpool support for mkl_conv_bwd ops.
• Threadpool support for relu, eltwise and softmax.
• Threadpool api support for misc ops.
• Threadpool support for quantize, dequantize and transpose op.
• Threadpool api implementation for concat and fused batchnorm op.
• Enable DepthwiseConv2D bfloat16 fusions.
• Implement new DNNL1.x MatMul primitive cache.
• Enable BF16 Softmax/SoftmaxGrad.
• Enabling Conv2D bfloat16 fusions.
• Support MatMul fusion for bfloat16 type.
• Enabling conv2D (NCHW format) fusion in grappler remapper.
• Fusing BN and Relu in mkl path.
• Enable DepthwiseConv2D + BiasAdd (+ Relu/Relu6/Elu) fusion.
• Make BFloat16 support for MatMul and BatchMatMul conditionally compatible by removing macros that were guarding DNNLv1.2 specific code.
• Support MKL Quantized Matmul With Bias and Dequantize Op and DNNL 1.0.
• Upgrading RequantizePerChannel Op with API changes in MKLDNN 1.0.
• Changes for DNNL1.x fused_batch_norm and Pooling Ops(Max and Avg).
• Updating QuantizeV2 and Dequantize Ops with API changes in MKLDNN 1.0.
• Upgrading RequantizePerChannel Op with API changes in MKLDNN 1.0.
• DNN1.0 integration - concat op.
• Updating MatMul kernels with MKLDNN 1.x API changes.
• DNNL 1.0 op support for Softmax, Identity_op, and Lrn ops.
• Adding support of Conv backward for DNN 1.0.
• MKL-DNNL v1.0 integration with AddN ops.
• Slice and Reshape op support with MKLDNN 1.0.
• MKL-DNN v1.0 integration with pooling ops.
• Relu op MKL-DNN 1.x integration.
• DNNL1.x integration for tf_conv_ops.h and transpose.cc.
• Add weight cache for FP32 MatMul.
• Use buffer as primitive key.
• Avoid unnecessary data reorders.
• Create a partial key for output_scale.
• matmul,qmatmul and fusedops support for threadpool api.
• Batch Matmul enhancements.
• Remove duplicate registration for softmax bf16 op.
• Optimization for MirrorPad op.
• Adding BFloat16 unit tests for MKL layout pass.
• Add FP32 fusion of MatMul and Relu.
• Transpose + Maxpool3D + Transpose fusion.
• Reimplement CompareMklDnnLayouts.
• Enable TF_NUM_INTEROP_THREADS for MKL-DNN backend.
• Reuse input tensor in mkl conv2d.
• Conditionally enabling bfloat16.
• Add primitive cache for mkl concat.
• Reverting bias cache optimization.
• Add primitive cache for mkl softmax.
• Enable FP32 FusedMatMul for MKL-DNN.
• Supporting MatMul, Transpose and Softmax with BFloat16 type.
• Add support for Addv2.
• Integrated MKL input conversion op with MKL-DNN v1.x.
• Update Keras LayerNorm to use FusedBatchNorm.

Bug fixs:

• Fix 3 unit test failures
//tensorflow/python/kernel_tests:svd_op_test
//tensorflow/python:layers_normalization_test
//tensorflow/python/ops/parallel_for:gradients_test
• Fix bug in MKlConcat.
• Fix bug MKlMaxPoolGrad.
• Fix bfloat16 build failure in MklRelu.
• Fix dequantize op regression issue.
• Fix incorrect DNNL1.2 integration in pooling backprop.
• Fix bfloat16 integration in MatMul and BatchMatMul.
• Fix a bug in MKL Concat op.
• Fix performance regression in DNNL 1 due to lack of primitive cache on reorder
• Fix build error.
• Fix compilation for DNNL 1.0.
• Fix Shape compilation issue in MKL build.
• Fix a bug in Elu Op.
• Fix MatMul and Elu fusion issue.
• Fix memory leak.
• Fix Eigen related compilation error.
• Fix ut check_numerics_callback_test.
• Fix bias cache accuracy issue.
• Fix missing libiomp5 issue due to missing deps.
• Fix dequantize accuracy issue and re-enable this op.
• Fix quantize accuracy loss.
• Fix spurious omp thread spawning.

Additional security and performance patches:

• Upgrade Sqlite3 to 3.33.0 to fix CVE-2020-11656
• Upgrade curl version to 7.71.1 to fix CVE-2019-15601
• Remove fft2d.

Known issues:

• 2 unit test failure as below are left, which are same with Tensorflow 2.3 branch.
//tensorflow/python/kernel_tests:relu_op_test
//tensorflow/python/debug:analyzer_cli_test

Intel® Optimizations for TensorFlow* 2.3.0

31 Aug 05:02
f3fbb16
Compare
Choose a tag to compare

This release of Intel® Optimized TensorFlow  is based on the TensorFlow v2.3.0 tag and is built with support for oneDNN (oneAPI Deep Neural Network Library). For features and fixes that were introduced in TensorFlow 2.3.0, please see the TensorFlow 2.3.0 release notes. This build was built from v2.3.0.

New functionality and usability improvements:

  • BFloat16 support for Intel CPUs.
  • BFloat16 training optimizations are available for many popular models in the Intel Model Zoo.
  • BFloat16 inference optimizations are available for a limited number of models.
  • AutoMixedPrecisionMkl feature is supported. getting-started-with-automixedprecisionmkl
  • oneDNN - moved from 0.x to version 1.4
  • Support for Intel® MKL-DNN version 0.x is still available.
  • building with DNNL0 option is available by specifying --define=build_with_mkl_dnn_v1_only=false
  • Default build with --config=mkl will enable DNNL1 with BFloat16 data type.
  • Released AVX512 binary packages and containers to show case out of box BFloat16 data type support and performance for our customers.

Bug fixes:

Versions and components

Known issues

  • OMP Threads are created with variable number when we compile with XLA flag ON 40836
  • Incorrect result of _MKLMaxPoolGrad 40122
  • test_conv_bn_dropout and test_conv_pool tests of //tensorflow/python:auto_mixed_precision_test fail with MKL backend on AVX.
  • //tensorflow/core/grappler/optimizers:remapper_test is failing in 2.3 branch.

How to:

Intel® Optimizations for TensorFlow* 2.2.0

04 Aug 17:10
Compare
Choose a tag to compare

This release of Intel® Optimized TensorFlow is based on the TensorFlow v2.2.0 tag (https://github.com/tensorflow/tensorflow/tree/v2.2.0) as built with support for oneAPI Deep Neural Network Library (oneDNN). For features and fixes that were introduced in TensorFlow 2.2.0, please see the TensorFlow 2.2.0 release notes. This build was built from https://github.com//tensorflow/tree/v2.2.0 and contains the following features and fixes:

New functionality and usability improvements:

  • Add Python3.8 support for this version

  • Make BFloat16 MatMul & BatchMatMul conditional compliable with oneDNN 1.2

  • Make oneDNN 1.2 as default lib for MKL backend

  • Add support for DNNL1 builds to public CI

  • Add exponential_avg_factor attribute to MklFusedBatchNorm ops

  • Integrate MklLayoutPass with DNNL 1.0

  • Add support for MKL QuantizedMatMulWithBiasAndDequantize ops

  • Integrate fused_batch_norm and Pooling ops (Max and Avg) with DNNL 1.0

  • Optimize the CombinedNonMaxSuppression ops CPU kernel from single thread to multi-thread

  • Integrate Concat, QMatMul, FusedMatMul, QuantizeV2, Dequantize, RequantizePerChannel, Slice, Reshape, Pooling, AddN, Relu and Transpose Ops with DNNL 1.0

  • Integrate MatMul and BatchMatMul BFloat16 kernels with DNNL 1.0

  • Refactor the implementation of Conv forward and backward for DNNL 1.0

  • Add weight cache for FP32 MatMul

  • Updating MKL implementation of Eager API

  • Support for oneDNN version 1.2.2

Bug fixes:

  • Fix MKL test script

  • Upgrade Sqlite3 to fix CVE-2019-19880 CVE-2019-19244 and CVE-2019-19645

  • Fix memory leak in DNNL 0.21.2

  • Fixing MKL test script to accept parameters

  • Fix dequantize accuracy issue and re-enable this OP

  • Fix Shape compilation issue in MKL build

  • Enable some more operators with bfloat16 graph rewrite tests

  • Avoid unnecessary data reorders

  • Create a partial key for output_scale

Additional security and performance patches:

Known issues:

  • Tensorflow 2.2 get incorrect result of _MKLMaxPoolGrad

Intel® Optimizations for TensorFlow* 2.1.1

04 Aug 17:21
Compare
Choose a tag to compare

This release of Intel® Optimized TensorFlow is based on the TensorFlow v2.1.1 tag (https://github.com/tensorflow/tensorflow/tree/v2.1.1) as built with support for Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN). For features and fixes that were introduced in TensorFlow 2.1.1, please see the TensorFlow 2.1.1 release notes. This build was built from https://github.com//tensorflow/tree/v2.1.1 and contains the following features and fixes:

Bug fixes:

Additional security and performance patches:

Intel® Optimizations for TensorFlow* 2.1.0

04 Aug 17:07
Compare
Choose a tag to compare

This release of Intel® Optimized TensorFlow is based on the TensorFlow v2.1.0 tag (https://github.com/tensorflow/tensorflow/tree/v2.1.0) as built with support for Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN). For features and fixes that were introduced in TensorFlow 2.1.0, please see the TensorFlow 2.1.0 release notes. This build was built from https://github.com//tensorflow/tree/v2.1.0 and contains the following features and fixes:

New functionality and usability improvements:

  • Support input tensor reuse in Conv2d

  • Support primitive cache for Softmax and Concat, weight cache for quantized MatMul and bias tensor cache for INT8 inference

  • Enable FP32 FusedMatMul for MKL-DNN

  • Conditionally enabling bfloat16, and support MatMul, Transpose and Softmax with BFloat16 type

  • Support for {int8,int8} convolutions and fusions

  • Improve eager performance for small batch sizes

  • Enable TF_NUM_INTEROP_THREADS for MKL-DNN backend

  • Support for Intel® MKL-DNN version 0.21.2

Bug fixes:

  • Fix compilation error in eager test

  • Fix a memory leak problem

  • Fix Eigen related compilation error

  • Fix performance regression in eager mode

  • Fix interop default setting in intel

  • Refactor MKL Eager from std::vector to std::hashmap for cleaner design

  • Fix MKL QuantizeV2 operator

  • Fix unit test check_numerics_callback_test failure

  • Fix bias cache accuracy issue

  • Fix debug_grappler unit test failure

  • Add OMP_NUM_THREADS support to mkl tests

  • Parallelizing scatter update op

  • Fix issues with hadoopFileSystem load error message

  • Fix missing libiomp5 issue due to missing deps

  • Upgrade curl to fix CVE-2019-5481 and CVE-2019-5482

  • Revert bias cache optimization

  • Disable "Conv3D with stride > 1" cases

  • Upgrading MKL public CI to py3

  • Set build 2.0 as default in Dockerfile.devel-mkl

  • Fix _MklQuantizeV2 rewrite issue

  • Fix spurious OpenMP thread spawning

  • Move IsMklEnabled() test to new module

  • Fix deprecation test for MKL DNN when OpenMP threads are set

  • Upgrading the checkerframework component to the version 2.10.0

  • Integrated MKL input conversion op with MKL-DNN v1.x

Additional security and performance patches:

Intel® Optimizations for TensorFlow* 2.0.1

04 Aug 17:12
Compare
Choose a tag to compare

This release of Intel® Optimized TensorFlow is based on the TensorFlow v2.0.1 tag (https://github.com/tensorflow/tensorflow/tree/v2.0.1) as built with support for Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN). For features and fixes that were introduced in TensorFlow 2.0.1, please see the TensorFlow 2.0.1 release notes. This build contains the following features and fixes:

Bug fixes:

  • Fixes a multiple .dist-info directories issue with pip 20+ in wheel files

  • Fixes a security vulnerability where converting a Python string to a tf.float16 value produces a segmentation fault (CVE-2020-5215)

Additional security and performance patches:

Intel® Optimizations for TensorFlow* 1.15.2

04 Aug 17:13
Compare
Choose a tag to compare

This release of Intel® Optimized TensorFlow is based on the TensorFlow v1.15.2 tag (https://github.com/tensorflow/tensorflow/tree/v1.15.2) as built with support for Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN). For features and fixes that were introduced in TensorFlow 1.15.2, please see the TensorFlow 1.15.2 release notes. This build contains the following features and fixes:

Bug fixes:

  • Fixes a multiple .dist-info directories issue with pip 20+ in wheel files

  • Fixes a security vulnerability where converting a Python string to a tf.float16 value produces a segmentation fault (CVE-2020-5215)

Additional security and performance patches: