Skip to content

Releases: ROCm/aotriton

AOTriton 0.7.1 Beta

04 Oct 21:50
f6b28a9
Compare
Choose a tag to compare

This is a point release

0.7.1b can be used as drop-in replacement of 0.7b shared object file.

What's Changed

Full Changelog: 0.7b...0.7.1b

AOTriton 0.7 Beta

23 Aug 16:19
9be0406
Compare
Choose a tag to compare

What's Changed

  • Default to Shared Object by @jithunnair-amd in #33
  • Add varlen support to AOTriton's Flash Attention by @xinyazhang in #31
  • Switch to upstream Triton compiler, and related changes by @xinyazhang in #36
  • Improve Backward Performance and Experimental Navi31 Support by @xinyazhang in #39
    • Introduce new tuning system based on pre-compiled GPU kernels
    • Navi 31's support is still experimental
  • Support hipGraph usage in PyTorch by @xinyazhang in #40
    • This changes the RNG API used by FA kernels.
    • Switch to new testing scheme to match PyTorch 2.5's changes

New Contributors

Full Changelog: 0.6b...0.7b

Preview 2 of 0.7b

04 Aug 18:04
Compare
Choose a tag to compare

The tuning database for Preview 1 was generated with newer triton kernel which does not use block pointer anymore. However Preview 1 does not include those changes.

Preview 2 was created to fix this.

Preview 1 of 0.7b

04 Aug 08:42
Compare
Choose a tag to compare

Preview 1 of 0.7b.

What's Changed

  1. Switch to Triton upstream compiler
  2. Improved backward kernel performance with better tuning database Didn't fully accomplish this, check Preview 2 for this feature
  3. Add Navi31 support
  4. Default to AOTRITON_COMPRESS_KERNEL=ON
  5. Requires zstd as runtime dependency

Known problems

  1. No Navi32 support
  2. Lack of changes, especially ABI breaks to the library, that enable the generation the tuning_database.sqlite3 shipped in the preview version.

AOTriton 0.4.2 Beta

02 Aug 22:08
Compare
Choose a tag to compare

Manylinux2_28 updates to 0.4.1b

AOTriton 0.6 Beta

06 Jun 02:52
04b5df8
Compare
Choose a tag to compare

What's Changed

  • Resolve cmake conflicts when adding aotriton into TE via add_subdirectory by @wangye805 in #23
  • [mGPU] Run hipModuleLoadDataEx for each GPU device. by @xinyazhang in #24
  • Adding mutex.h for TE pytorch extension compilation by @wangye805 in #26
  • Refactor the build system by @xinyazhang in #29

New Contributors

Full Changelog: 0.5b...0.6b

AOTriton 0.5 Beta

08 May 00:47
00ccbf3
Compare
Choose a tag to compare

What's Changed

  • Switch Tuning database to SQLite3 for Incremental Tuning
  • Add matrix bias to forward/backward kernel
  • Fix build failures due to missing
  • Add new triton kernel debug_fill_dropout_rng to for debugging dropout
  • Add FP32 support to fulfill the functionalities required by torch.nn.attention.SDPBackend.EFFICIENT_ATTENTION

Notes about binary delivery

Starting from 0.5 Beta, we are not delivering binary form of AOTriton along with software releases due to software supply chain security considerations, for now.

Full Changelog: 0.4.1b...0.5b

AOTriton 0.4.1 Beta

29 Mar 20:14
Compare
Choose a tag to compare

Summary

This is an emergency fix for the build process. It delivers the same (in terms of functional and performance) as of 0.4b version. AOTriton users who are not seeking for building the library from source can keep using 0.4b binary release packages.

Changes

  • Triton's setup.py downloads CUDA Packages during the build, but it does not always success. AOTriton does not need them for now, and hence they were commented out.

AOTriton 0.4 Beta

20 Mar 06:15
e537881
Compare
Choose a tag to compare

Summary

This is the first release which is considered sufficiently stable for production.

Features

AOTriton GA Preview for Legal Scan

19 Feb 20:21
2569660
Compare
Choose a tag to compare
Pre-release

This release is created for legal review before releasing to the public.

Compiled on ROCM 6.0, Ubuntu 20.04, Python 3.9