-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sub-byte data types: float4_e2m1fn, float6_e2m3fn, float6_e3m2fn #181
Conversation
Note: a small unrelated change in "_finfo.py" removes unreadable boilerplate and replaces it with (faster) dict lookups for instantiating "finfo" objects. |
I'm trying to understand the relationship between these types and the MX types. From my quick read of the MX spec, all of the types it defines are block-scaled formats, which these types are not? Can you say more about the relationship and the use case for these? |
The MXFP8 type is a pair of tensors (e.g., 1st could have the E5M2 type, 2nd - the E8M0 type with 32x less elements). Proper support of such MX type (where the value has two different primitive types) is way too complicated, but we could instead use two values. This way a dot op with scaled inputs (what we're actually interested in) could be represented as a custom call with four input tensors. So, in order to implement MXFP8, we need E8M0 primitive type in XLA (and E5M2/E4M3 already exist). For MXFP4, we need both E8M0 and E2M1. Adding FP6 types (E2M3 and E3M2) just for completeness, they are very similar and will unblock us in the future. All of these types are described in the MX spec: https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf |
e51626b
to
9bdf962
Compare
README.md
Outdated
@@ -66,6 +70,39 @@ A `bfloat16` number is a single-precision float truncated at 16 bits. | |||
|
|||
Exponent: 8, Mantissa: 7, exponent bias: 127. IEEE 754, with NaN and inf. | |||
|
|||
### `float4_e2m1` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix names to include fn
suffix.
I'm actually having trouble finding a great definition of the f
and n
suffixes even in the LLVM discussion that added them: I don't suppose you have a link to the definition?
In particular, I'm not sure if n
should appear in the name, given that n
also appears in the suffix of FP8 types with a single NaN, but these have no NaN. So I'm a bit unclear what the suffix means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f
means "finite", n
means "special NaN representation" (e.g. non-IEEE)
I saw this somewhere in the comments, will post a link once I find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the type name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually, it's in this same file, below:
F
is for "finite" (no infinities), N
for with special NaN encoding, UZ
for unsigned zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one could say that "no NaN encoding" is a "special NaN encoding".
Also, LLVM APFloat.cpp has these types with "FN" suffix:
https://github.com/llvm/llvm-project/blob/5537ae87b3a87b3abeb4e6983cecd9b103648243/llvm/lib/Support/APFloat.cpp#L150
We could probably change the suffix, but we need to be consistent across the repositories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should agree with LLVM, so that works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you didn't push the fix yet, the headers are still suffix-less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now.
README.md
Outdated
Microscaling format, 4 bits (encoding: `0bSEEM`) using byte storage (higher 4 | ||
bits are unused). NaN representation is undefined. | ||
|
||
Possible values: [0, 0.5, 1, 1.5, 2, 3, 4, 6] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably stick backticks around the values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the negative values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added backticks around the values (here and below).
Changed to "Possible absolute values" to keep the list short.
ml_dtypes/_finfo.py
Outdated
obj.epsneg = 0.125 | ||
obj.machep = -3 | ||
obj.negep = -3 | ||
obj.max = float6_e2m3fn(7.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd personally be tempted to specify these as bit patterns (float.fromhex("0x1234.1")
, IIRC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
ml_dtypes/_src/dtypes.cc
Outdated
@@ -366,6 +423,14 @@ bool Initialize() { | |||
success &= RegisterTwoWayCustomCast<float8_e3m4, float8_e4m3fn, float>(); | |||
success &= RegisterTwoWayCustomCast<float8_e3m4, float8_e5m2, float>(); | |||
success &= RegisterTwoWayCustomCast<float8_e3m4, float8_e4m3, float>(); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting unwieldy. This is just covering all-pairs of extension types, I think? I suspect this can be factored better with some template trickery.
If nothing else, the function you added called RegisterCustomCastsWithBfloat16AndFloat8Types
could just be used everywhere here and you call it once for each type?
Probably possible to do better than that with some template cunning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some templates to reduce boilerplate in this file.
f6f5f43
to
3d633fc
Compare
a6aaef4
to
499967c
Compare
499967c
to
b68531f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, other than the clang-format failure.
…U) (#2581) This is a proposal to add MX (microscaling) floating point types to StableHLO. Related links: - StableHLO [PR#2582](#2582) Add MX floating point types (f4E2M1FN, f6E2M3FN, f6E3M2FN, f8E8M0FNU) - LLVM [PR#95392](llvm/llvm-project#95392) [APFloat] Add APFloat support for FP4 data type - LLVM [PR#94735](llvm/llvm-project#94735) [APFloat] Add APFloat support for FP6 data types - LLVM [PR#107127](llvm/llvm-project#107127) [APFloat] Add APFloat support for E8M0 type - LLVM [PR#108877](llvm/llvm-project#108877) [MLIR] Add f4E2M1FN type - LLVM [PR#107999](llvm/llvm-project#107999) [MLIR] Add f6E2M3FN type - LLVM [PR#105573](llvm/llvm-project#105573) [MLIR] Add f6E3M2FN type - LLVM [PR#111028](llvm/llvm-project#111028) [MLIR] Add f8E8M0FNU type - JAX-ML [PR#181](jax-ml/ml_dtypes#181) Add sub-byte data types: float4_e2m1fn, float6_e2m3fn, float6_e3m2fn - JAX-ML [PR#166](jax-ml/ml_dtypes#181) Add float8_e8m0_fnu (E8M0) OCP MX scale format
This PR adds MX (microscaling) floating point types support.
F4e2m1
,F6e2m3
,F6e3m2
types are proposed in OpenCompute MX Specification.These types have the following notable features:
nan
encoding, only finite values are supported;inf
encoding, similar to the existing 8-bit types withfn
suffix;int2
andint4
types.Related PRs: