v1.1.0

EikanWang released this 12 Nov 07:24

What's New

Added optimization for training with FP32 data type & BF16 data type. All the optimized FP32/BF16 backward operators include:
- Conv2d
- Relu
- Gelu
- Linear
- Pooling
- BatchNorm
- LayerNorm
- Cat
- Softmax
- Sigmoid
- Split
- Embedding_bag
- Interaction
- MLP
More fusion patterns are supported and validated in the release, see table:

Fusion Patterns Release

Conv + Sum v1.0

Conv + BN v1.0

Conv + Relu v1.0

Linear + Relu v1.0

Conv + Eltwise v1.1

Linear + Gelu v1.1
Add docker support
[Alpha] Multi-node training with oneCCL support.
[Alpha] INT8 inference optimization.

Performance

The release has daily automated testing for the supported models: ResNet50, ResNext101, Huggingface Bert, DLRM, Resnext3d, Transformer. With the extension imported, it can bring up to 1.2x~1.7x BF16 over FP32 training performance improvements on the 3rd Gen Intel Xeon scalable processors (formerly codename Cooper Lake).

Known issue

Some workloads may crash after several iterations on the extension with jemalloc enabled.

Assets 2