v1.1.0
What's New
-
Added optimization for training with FP32 data type & BF16 data type. All the optimized FP32/BF16 backward operators include:
- Conv2d
- Relu
- Gelu
- Linear
- Pooling
- BatchNorm
- LayerNorm
- Cat
- Softmax
- Sigmoid
- Split
- Embedding_bag
- Interaction
- MLP
-
More fusion patterns are supported and validated in the release, see table:
Fusion Patterns Release Conv + Sum v1.0 Conv + BN v1.0 Conv + Relu v1.0 Linear + Relu v1.0 Conv + Eltwise v1.1 Linear + Gelu v1.1 -
Add docker support
-
[Alpha] Multi-node training with oneCCL support.
-
[Alpha] INT8 inference optimization.
Performance
- The release has daily automated testing for the supported models: ResNet50, ResNext101, Huggingface Bert, DLRM, Resnext3d, Transformer. With the extension imported, it can bring up to 1.2x~1.7x BF16 over FP32 training performance improvements on the 3rd Gen Intel Xeon scalable processors (formerly codename Cooper Lake).
Known issue
- Some workloads may crash after several iterations on the extension with jemalloc enabled.