Skip to content

v1.1.0

Compare
Choose a tag to compare
@EikanWang EikanWang released this 12 Nov 07:24

What's New

  • Added optimization for training with FP32 data type & BF16 data type. All the optimized FP32/BF16 backward operators include:

    • Conv2d
    • Relu
    • Gelu
    • Linear
    • Pooling
    • BatchNorm
    • LayerNorm
    • Cat
    • Softmax
    • Sigmoid
    • Split
    • Embedding_bag
    • Interaction
    • MLP
  • More fusion patterns are supported and validated in the release, see table:

    Fusion Patterns Release
    Conv + Sum v1.0
    Conv + BN v1.0
    Conv + Relu v1.0
    Linear + Relu v1.0
    Conv + Eltwise v1.1
    Linear + Gelu v1.1
  • Add docker support

  • [Alpha] Multi-node training with oneCCL support.

  • [Alpha] INT8 inference optimization.

Performance

  • The release has daily automated testing for the supported models: ResNet50, ResNext101, Huggingface Bert, DLRM, Resnext3d, Transformer. With the extension imported, it can bring up to 1.2x~1.7x BF16 over FP32 training performance improvements on the 3rd Gen Intel Xeon scalable processors (formerly codename Cooper Lake).

Known issue

  • Some workloads may crash after several iterations on the extension with jemalloc enabled.