Skip to content

v2.7.1

Compare
Choose a tag to compare
@vpirogov vpirogov released this 21 Oct 22:48
· 87 commits to rls-v2.7 since this release

This is a patch release containing the following changes to v2.7:

  • Fixed performance regression for batch normalization primitive in TBB and threadpool configurations (cd953e4)
  • Improved grouped convolution performance on Xe Architecture GPUs (d7a781e, cb1f3fe, 4e84474, 7ba3c40)
  • Fixed runtime error in int8 reorder on Intel GPUs (53532a9)
  • Reverted MEMFD allocator in Xbyak to avoid segfaults in high load scenarios (3e29ae2)
  • Fixed a defect with incorrect caching of BRGEMM-based matmul primitive implementations with trivial dimensions (87cd979)
  • Improved depthwise convolution performance with per-tensor binary post-ops for Intel CPUs (f430a5a)
  • Extended threadpool API to manage maximum concurrency (8a1e959, 64e5594)
  • Fixed potential integer overflow in BRGEMM-based convolution implementation (25ccee3)
  • Fixed performance regression in concat primitive with any format on Intel CPUs (2a60ade, feb614d)
  • Fixed compile-time warnings in matmul_perf example (b5faa77)
  • Fixed 'insufficient registers in requested bundle' runtime error in convolution primitive on Xe Architecture GPUs (4c9d46a)
  • Addressed performance regression for certain convolution cases on Xe Architecture GPUs (f28b58a, 18764fb)
  • Added support for Intel DPC++/C++ Compiler 2023 (c3781c6, a1a8952, 9bc87e6, e3b1987)
  • Fixed int8 matmul and inner product performance regression on Xe Architecture GPUs (3693fbf, c8adc17)
  • Fixed accuracy issue for convolution, inner product and matmul primitives with tanh post-op on Xe Architecture GPUs (88b4e57, 83ce6d2, 6224dc6, 10f0d0a)
  • Suppressed spurious build warnings with GCC 11 (44255a8)