Skip to content

V4.1.1 Performance improvements

Compare
Choose a tag to compare
@amcamd amcamd released this 26 Apr 20:55
· 3997 commits to master since this release

Features

  • Support LSHL_ADD
  • Vectorize the store-C path
  • Enable DirectToLds for half
  • Fix sync with DirectToLds when PrefetchLocalRead=0
  • Optimize solution merging using lookup
  • Align MAC blocks when using half datatype
  • Add mi25 Device 6860 to vega10
  • Train for DataInitTypeBeta: 0
  • Add ResNet1x1 to Exact sizes