Document NaN policy #48523

LilithHafner · 2023-02-04T16:58:58Z

There are a bunch of different NaN values. reinterpret(Float64, reinterpret(UInt64, NaN) + 1) and NaN are two examples.

NaNs propagate through floating point operations. sin(NaN) must be NaN in the sense of isnan(sin(NaN)), but which NaN should it return? Must it return the canonical NaN? Must it return its input? May it return some other number that isnan for performance reasons? These questions come up for most math functions, min/max/sort, and possibly others.

I propose to explicitly document that mathematical functions (e.g. sin, hypot, min) will produce an NaN result on NaN input but that which NaN is produced is an implementation detail.

The text was updated successfully, but these errors were encountered:

mikmoore · 2023-02-06T17:00:20Z

In defiance of "never say never", ~~it's not a horrible bet that literally no Julia user relies on the NaN semantics of any function beyond isnan(f(NaN))~~ EDIT: someone below says that payload tagging is used in SentinelArrays.jl. There would be a cost to maintaining this behavior when the basic functions do not. For how exceedingly rarely somebody cares and how relatively cheaply one can wrap a function for a specific semantic, I think we can afford to adopt any-NaN semantics (a name I made up, meaning that what payload is produced from NaN inputs is unspecified). Further, any-NaN semantics leave room to make non-breaking changes in the future if the landscape shifts such that people do actually care about payloads. It's also robust to hardware that makes unusual NaN propagation choices (~~I don't think IEEE754 dictates a specific semantic they must follow~~ EDIT: see end of post).

Up to now, almost every function has been implemented with an input-NaN semantic (another name I made up, specifying that the payload of one NaN input is propagated to the output). This is also what's usually (but perhaps not always?) used by hardware-native operations. In fact, our current input-NaN semantic is usually contingent on hardware input-NaN semantics.

There is a risk that this results in re-implementations of functions that "break" existing behaviors if somebody really did rely on payload propagation. For example, I believe there is a faster min -- but maybe not max -- on x86 if you're willing to mangle payloads. But there was never any formal guarantee and I'm not sure that anybody ever cared.

Is the proposal to document this centrally or on a per-function basis? Per-function seems like it would be a never-finished job and add noise to docstrings, so I'd propose to only document it centrally. Functions which have notable deviations should be documented locally. For example, that hypot is not poisoned by NaN in the presence of Inf.

EDIT:
Originally, I was unsure of IEEE754's stance on payload propagation. The document linked by a later poster suggests that "The current standard specifies that if an operation has multiple NaN inputs, then the result should be one of the input NaNs. The standard does not specify which one." I assume this extends to unary functions as well.

LilithHafner · 2023-02-06T17:20:04Z

If no users care that would make things easier. I posted on slack and discourse for higher visibility.

We could also run pkgeval on a branch that mangles all NaNs, but that seems like a lot of work.

stevengj · 2023-02-06T17:35:49Z

See also this IEEE standards document for some more background. The most common extant applications of NaN payloads seems to be (a) tracking exception types and (b) tracking NA (missing) values in R, neither of which are especially critical in Julia (because we normally use exceptions and missing values, respectively). (I could imagine some Julia application using R-style tagged-NaNs instead of Union{Missing,Float64} for performance/memory reasons, I guess?) There is also JavaScript-style NaN boxing, which seems even less likely in Julia. The IEEE document also mentions some general issues with trying to propagate NaN payloads.

quinnj · 2023-02-06T17:56:11Z

We use an R-style tagged-NaN in SentinelArrays.jl.

Specifically, this NaN:

julia> Core.bitcast(Float64, typemax(UInt64))
NaN

because we do a memset with 0xff on the Vector{Float64} to set missing.

mikmoore · 2023-02-06T18:38:08Z

It seems that some people do use payload tagging in some cases.

Further,

The document linked by a later poster suggests that "The current standard specifies that if an operation has multiple NaN inputs, then the result should be one of the input NaNs. The standard does not specify which one." I assume this extends to unary functions as well.

If true, this would mean that to reject payload propagation semantics would be to violate IEEE754 semantics on any function defined therein. I'm not excited about this prospect.

Let's talk cost/benefit. Are there functions that we would implement differently with loosened NaN semantics? I mentioned a small optimization of min on x86 (not aarch64) but it wouldn't be game-changing. Any others?

It seems that, if anything, we might have to document a general policy (although perhaps not a strict guarantee) that a NaN output resulting from one or more NaN inputs should include the payload of one of the NaN inputs. We'd have to adhere to this policy for IEEE754 functions but probably should in other cases as well.

StefanKarpinski · 2023-02-06T22:41:50Z

It seems like for any function that returns NaN when one of the inputs is NaN, we can try to return the NaN that was passed in. That's how hardware float operations work, so it often happens naturally. In places where we "generate" a NaN, we should produce the "standard NaN", namely the one you get when you evaluate NaN.

andrewjradcliffe · 2023-02-08T03:39:02Z

Somewhat related food for thought.

Propagation of NaN payloads through through various functions in Base is haphazard at best -- and I am not suggesting that it must be uniform! -- but this fact is likely (happily) overlooked by the vast majority of users and developers. Bearing in mind the limitations imposed by LLVM, it is worthwhile to question what might be done.

sin is a simple example where we do something in Julia which mangles a payload (i.e. we return the "standard NaN").

The code below demonstrates some of the heterogeneity.

x = reinterpret(Float64, reinterpret(UInt64, NaN) | 0xff);
for f in (sin, cos, tan, acos, asin, atan, log, exp, sqrt, abs2)
    println(f, "\t:\t", bitstring(f(x)))
end

If we want to follow Stefan's logic, then all occurrences which amount to isnan(x) && return NaN must instead be isnan(x) && return x. Easily done and without penalties, at least from a conceptual standpoint; the test suite may inadvertently rely on the extant behavior, but should not be too substantial in Base. The ecosystem at large may rely on the haphazard NaN` behavior for testing (i.e. silencing of payloads by some functions); I suppose PkgEval to measure extent of damage.

oscardssmith · 2023-02-08T04:24:58Z

That sin example is a good catch. A return x will be a bit faster since you don't have to load a new NaN value and can just return the one you have in a register already. That said in general, I don't really want to document NaN behavior since especially for 2 argument functions, I could see it being useful in some cases to make NaNs with arbitrary combinations of the bits of NaNs of the inputs.

andrewjradcliffe · 2023-02-09T01:29:33Z

I don't really want to document NaN behavior since...

I concur on leaving NaN behavior undocumented. Strategic ambiguity, particularly in light of the uncertainty about what might become commonly adopted 10 years from now (once the dust settles around IEEE, LLVM's handling of NaNs, random community drift, etc.), can be a good thing.

LilithHafner · 2023-02-09T03:35:39Z

How would y'all feel about the proposal in the OP: document returned payload as undefined

oscardssmith · 2023-02-09T04:01:58Z

the word undefined is a little scary because people think c UB, but documenting as not stable between versions would be great

vtjnash · 2023-02-09T12:35:16Z

C standards would call that unspecified behavior

StefanKarpinski · 2023-02-10T20:20:21Z

I think it would be fine to document it as not something that can be relied on, but still try to return the first NaN argument when possible. We can try to do that and decide later if it's worth it.

mikmoore · 2023-02-10T23:24:01Z

but still try to return the first NaN argument when possible

I disagree. I would say "one of the NaN arguments when possible." Anything more than that is going to be untenable. For example, the following two operations are implemented using a single native x86 instructions yet don't return the same positional operand when given two NaNs:

julia> x = reinterpret(Float64,-1); y = reinterpret(Float64,-2);

julia> reinterpret(Int, x+y) # vaddsd
-1

julia> reinterpret(Int, ifelse(x<y,x,y)) # vminsd
-2

Hardware does not take strong positions on this so supporting any positional preference would be a pain even on a single architecture (to say nothing of multiple). Plus, compilers are free to fiddle with some operations (e.g., a+b for b+a) so inlining and other factors can change behavior even with the hardware held constant.

StefanKarpinski · 2023-02-14T17:02:09Z

Yep, good point. One of the NaN arguments should be what we try to do.

workingjubilee · 2023-02-21T09:48:29Z

I feel I should note, to my great annoyance, that payload propagation is a should and not a shall according to IEEE754.

I think it is still wise to try to attempt it because it simplifies reasoning about a rather... quirky condition in a type. In particular, it means that you know exactly what the value is when you do a binary operation of any NaN and any non-NaN (generally: the NaN).

LilithHafner added docs This change adds or pertains to documentation maths Mathematical functions labels Feb 4, 2023

LilithHafner mentioned this issue Feb 9, 2023

Mangle NaN values returned from math functions #48616

Closed

bobcassels mentioned this issue Feb 13, 2023

Make sind and cosd a little faster on IEEE floats #48668

Open

oscardssmith mentioned this issue Apr 7, 2023

when x is NaN in trig functions return x rather than NaN #49285

Merged

mikmoore mentioned this issue Apr 14, 2023

Floating point intrinsics are not IPO :consistent #49353

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document NaN policy #48523

Document NaN policy #48523

LilithHafner commented Feb 4, 2023

mikmoore commented Feb 6, 2023 •

edited

Loading

LilithHafner commented Feb 6, 2023

stevengj commented Feb 6, 2023 •

edited

Loading

quinnj commented Feb 6, 2023 •

edited

Loading

mikmoore commented Feb 6, 2023

StefanKarpinski commented Feb 6, 2023

andrewjradcliffe commented Feb 8, 2023

oscardssmith commented Feb 8, 2023

andrewjradcliffe commented Feb 9, 2023

LilithHafner commented Feb 9, 2023

oscardssmith commented Feb 9, 2023

vtjnash commented Feb 9, 2023

StefanKarpinski commented Feb 10, 2023

mikmoore commented Feb 10, 2023

StefanKarpinski commented Feb 14, 2023

workingjubilee commented Feb 21, 2023

Document NaN policy #48523

Document NaN policy #48523

Comments

LilithHafner commented Feb 4, 2023

mikmoore commented Feb 6, 2023 • edited Loading

LilithHafner commented Feb 6, 2023

stevengj commented Feb 6, 2023 • edited Loading

quinnj commented Feb 6, 2023 • edited Loading

mikmoore commented Feb 6, 2023

StefanKarpinski commented Feb 6, 2023

andrewjradcliffe commented Feb 8, 2023

oscardssmith commented Feb 8, 2023

andrewjradcliffe commented Feb 9, 2023

LilithHafner commented Feb 9, 2023

oscardssmith commented Feb 9, 2023

vtjnash commented Feb 9, 2023

StefanKarpinski commented Feb 10, 2023

mikmoore commented Feb 10, 2023

StefanKarpinski commented Feb 14, 2023

workingjubilee commented Feb 21, 2023

mikmoore commented Feb 6, 2023 •

edited

Loading

stevengj commented Feb 6, 2023 •

edited

Loading

quinnj commented Feb 6, 2023 •

edited

Loading