Demystifying NaN for the Working Programmer (opens in new tab)

(lucidchart.com)

57 pointsZacru4y ago36 comments

36 comments

NaN is a cancer. The choice that NaN == Nan being false is just wrong. Every type, every variable can have multiple reason for being invalid. Yet, no other type has ever chosen to make invalid values not being equal to themselves.

Pointers can be invalid. They can be invalid for any number of reason. Lack of memory, object not found, etc. No one ever suggest that null should not equal null.

File handle can be invalid. They can be invalid for any number of reasons: file not found, access denied, file server is offline. No one has ever made invalid handles not being equal to themselves.

The justification for NaN not being equal to themselves is just bonk.

ynik4y ago

In a world without generic programming, NaN not being equal to itself makes a certain amount of sense for some kinds of numeric code. But in a world with reusable generic algorithms the calculation changes -- here equality/ordering relations really must be transitive or weird shit happens. In C++ it's undefined behavior to call `std::sort` or `std::unique` on list of floats containing NaN.

Most languages nowadays have standard-library functions/types that require well-behaved equality, so why have a builtin type for which equality is not well-behaved?

Lascaille4y ago

>The justification for NaN not being equal to themselves is just bonk.

It makes a lot of sense to me. NaN indicates data has been lost. You did something and you stored the result in a number datatype but the result isn't a number. Data was lost. You lost the data and have only 'your answer wasn't a number.'

Comparing NaN with NaN is asking the computer 'we have two buckets that have overflowed, were their contents the same?' The answer is 'we don't know' which means, to err on the side of safety, the answer is 'no.'

No?

Dylan168074y ago

Let's say you make a particular NaN equal to itself.

But then it's sensible for different operations to give you different NaN values.

And you still wouldn't say that 4 < NaN is true, or NaN < 4 is true, would you?

So it's still going to confuse the user. Is just changing equality going to give you a better system overall?

Retric4y ago

Infinity is NaN, but 4 < infinity is true.

Dylan168074y ago

Signed infinity represents aggressive rounding, but you still know roughly what number it is. It works out well to let it still participate in equality and ordering. NaN can be created in many different ways and there is no way to say basically anything about what numbers it could be related to.

You could arbitrarily make NaN sort as if it was a certain value, and that would be useful when you want to sort a big array, but it would have unpleasant side effects when you're doing math. IEEE decided "always false" was less likely to cause problems, but to be clear you get problems no matter what you choose.

1 more reply

afiori4y ago

infinity is not NaN

1 more reply

Fire-Dragon-DoL4y ago

Note (without disagreeing). In SQL NULL!= NULL

jameshart4y ago

This article conflates the representational limits of floating point with the concept of NaN in a way that I suspect will lead to more confusion, not less.

Zero/zero doesn’t return NaN because it isn’t representable within floating point - it returns NaN because it is an expression that has no mathematical meaning.

The fact that sqrt(-1) has two valid nonreal answers has nothing to do with why it returns NaN - after all, sqrt(4) has two valid real answers so is also technically not representable by a single floating point value, but that doesn’t typically result in NaN.

NaN is just an error value you get when you ask floating point math a dumb question it can’t usefully answer.

Far more interesting and subtle are the ways in which positive and negative infinity and positive and negative zero let you actually still obtain useful (at least for purposes of things like comparison) results to certain calculations even if they overflow the representable range.

saagarjha4y ago

> The only reliable way to test for NaN is to use a language-dependent built-in function; the expression a === NaN is always false

Well, you test for it by comparing the value against itself and seeing if that returns false.

(There’s also a bit of confusion on by value vs. by reference comparison and the actual bit value on a NaN, which isn’t quite right.)

ithkuil4y ago

Signaling NaNs raise exceptions in some operations. Is comparison one of these?

stephencanon4y ago

They “raise exceptions” in the IEEE 754 sense, which is not at all the same thing as what most programming languages mean by “raise exception”. It means that they set a sticky flag in a register that may be queried at a later point, not that program control flow is redirected.

pletnes4y ago

The only use I saw for this is that you can enable compiler flags to crash the program when NaNs are encountered. Useful for testing Fortran code, in my experience. I didn’t see any support for other languages I’ve used.

1 more reply

olliej4y ago

I dislike this article, as it tries repeatedly to imply that the use of NaN is somehow a restriction cause by floating point.

No ieee754 ever produces a NaN result unless the operation has no valid result in the set of real values.

Similarly the behaviour in comparisons: if you want NaN to equal NaN you have to come up with a definition of equality that is also consistent with

    NaN < X

    NaN > X

    NaN == X

The logical result of this is that NaN does not equal itself, and I believe mathematicians agree on that definition. Again not a result of the representation, but a result of the mathematical rules of real values.

I want to be very clear here: floating point math always produces the correct value rounded (according to rounding mode) to the appropriate value in the represented space unless it is fundamentally not possible. The only place where floating point (or indeed any finite representation) produces an incorrectly rounded result are the transcendental functions, where some values can only be correctly rounded if you compute the exact value, but the exact value is irrational.

People seem hell bent on complaining about floating point behavior, but it is fundamentally mathematically sound. IEEE754 also specifies some functions like e^x-1 explicitly to ensure that you get the best possible accuracy for the core arithmetic operations

dzaima4y ago

greater-than and less-than already make no sense around NaN, you won't get much worse, I don't get what you're trying to point out with them. This is less a question about mathematical correctness (which there isn't much around NaN anyway), but more practical. There being this annoying NaN that breaks everything if its in an array to be sorted or in a set or a key in a map is just pure awful.

olliej4y ago

Correct they don’t make sense, but given < and > return a Boolean in the ieee environment they need to produce a deterministic value.

As you say relations with NaN don’t make sense, but given the requirement of a single value NaN != NaN makes the most “sense” mathematically, and a core principle of ieee754 was ensuring the most accurate rendition of true maths with a finite representation (see a bunch of papers by Kahan).

Of course x87’s ieee754 implementation does actually have multiple NaNs, infinities, and representations of the same value. For all its quirks remember x87 was what demonstrated that the ieee754 specification could be made fast and affordably, which non-intel manufacturers were all claiming was impossible. The only real “flaw”* in x87 was the explicit leading 1, which was an artifact of it intel being sufficiently ahead of the curve to predate dropping it.

* the x87 transcendtals are known to be hopelessly inaccurate, but that in theory could have been fixed, whereas the format could not be.

dzaima4y ago

mathematically, yes. In practice, NaN!=NaN just kills any hope of having any amount of sanity for operations that don't care about floating-point and just want to generally compare things. It's not very nice to say "sorting, hashmaps & hashsets containing NaNs cause the entire operation/structure to be completely undefined behavior", especially given that NaNs kind exist to allow noticing errors, not cause even more of them.

bryanrasmussen4y ago

I did a code assignment for a potential JavaScript heavy job in 2014, for some reason I think isNaN was part of the language then because I have a memory deciding not to use it (but could be misremembering), at any rate I did Number(x) !== Number(x) at some point.

In the meeting when they went over the code the guy who did it said we were wondering why you did this? So I had to explain NaN to him. He really did not know it existed. At any rate I thought this is a weird thing not to know anything at all about.

pletnes4y ago

Related: I’ve met developers who think NaNs are a language or library (notably pandas) feature.

amelius4y ago

Imagine doing if(x) ..., where x can be NaN. Shouldn't that throw an exception in most cases? Why are our compilers not doing it that way?

xen04y ago

Should it? It isn't obvious to me at all that throwing an exception in this case is the best behaviour. Throwing an exception when testing a value for 'truthiness' is extremely surprising.

On the other hand, I would strongly discourage 'if(x)' where x is a float that may be NaN purely because the 'correct' behaviour here isn't clear to me.

amelius4y ago

How about the case where x is (y > 0)? If y is NaN, shouldn't x be boolean-NaN? And shouldn't if(x) throw an exception? Or shouldn't (y > 0) throw an exception if you don't want boolean-NaNs?

xen04y ago

That's easy: y > 0 is False, not NaN.

You may not think this is wise, but this is very much how comparisons with NaN are defined.

And I think this is better than exception raising. Again, I think it would be _really_ weird for simple value comparisons to throw.

1 more reply

ElevenLathe4y ago

The compiler presumably can't know in most cases, but the runtime might be able to throw. It depends on the language implementation and the tradeoffs.

mrlonglong4y ago

Excellent article, this helped me understand the issues working with floating numbers. I work with them quite a lot when developing business logic and often times NaN can be a pain. Understanding why helps a lot.

PopePompus4y ago

I love NaNs, especially their "infectious" quality. Initializing float variable to NaNs before first assignment can make a lot of errors immediately obvious. I wish there were a NaN for integers.

colejohnson664y ago

What about a “nullable” double? In C#, you’d use `double?`, Rust would be Option<f64>, C++ would be std::optional<double>. Then any operation would throw upon an unset value?

olliej4y ago

That would required every operation on a floating point value to return an optional, which you’d then need to unwrap and branch on.

saagarjha4y ago

Don’t initialize them and turn on UBSan :)

j / k navigate · click thread line to collapse

36 comments

pierrebai4y ago

Pointers can be invalid. They can be invalid for any number of reason. Lack of memory, object not found, etc. No one ever suggest that null should not equal null.

File handle can be invalid. They can be invalid for any number of reasons: file not found, access denied, file server is offline. No one has ever made invalid handles not being equal to themselves.

The justification for NaN not being equal to themselves is just bonk.

ynik4y ago

Most languages nowadays have standard-library functions/types that require well-behaved equality, so why have a builtin type for which equality is not well-behaved?

Lascaille4y ago

>The justification for NaN not being equal to themselves is just bonk.

No?

Dylan168074y ago

Let's say you make a particular NaN equal to itself.

But then it's sensible for different operations to give you different NaN values.

And you still wouldn't say that 4 < NaN is true, or NaN < 4 is true, would you?

So it's still going to confuse the user. Is just changing equality going to give you a better system overall?

Retric4y ago

Infinity is NaN, but 4 < infinity is true.

Dylan168074y ago

1 more reply

afiori4y ago

infinity is not NaN

1 more reply

Fire-Dragon-DoL4y ago

Note (without disagreeing). In SQL NULL!= NULL

jameshart4y ago

This article conflates the representational limits of floating point with the concept of NaN in a way that I suspect will lead to more confusion, not less.

Zero/zero doesn’t return NaN because it isn’t representable within floating point - it returns NaN because it is an expression that has no mathematical meaning.

NaN is just an error value you get when you ask floating point math a dumb question it can’t usefully answer.

saagarjha4y ago

> The only reliable way to test for NaN is to use a language-dependent built-in function; the expression a === NaN is always false

Well, you test for it by comparing the value against itself and seeing if that returns false.

(There’s also a bit of confusion on by value vs. by reference comparison and the actual bit value on a NaN, which isn’t quite right.)

ithkuil4y ago

Signaling NaNs raise exceptions in some operations. Is comparison one of these?

stephencanon4y ago

pletnes4y ago

1 more reply

olliej4y ago

I dislike this article, as it tries repeatedly to imply that the use of NaN is somehow a restriction cause by floating point.

No ieee754 ever produces a NaN result unless the operation has no valid result in the set of real values.

Similarly the behaviour in comparisons: if you want NaN to equal NaN you have to come up with a definition of equality that is also consistent with

    NaN < X

    NaN > X

    NaN == X

dzaima4y ago

olliej4y ago

Correct they don’t make sense, but given < and > return a Boolean in the ieee environment they need to produce a deterministic value.

* the x87 transcendtals are known to be hopelessly inaccurate, but that in theory could have been fixed, whereas the format could not be.

dzaima4y ago

bryanrasmussen4y ago

pletnes4y ago

Related: I’ve met developers who think NaNs are a language or library (notably pandas) feature.

amelius4y ago

Imagine doing if(x) ..., where x can be NaN. Shouldn't that throw an exception in most cases? Why are our compilers not doing it that way?

xen04y ago

Should it? It isn't obvious to me at all that throwing an exception in this case is the best behaviour. Throwing an exception when testing a value for 'truthiness' is extremely surprising.

On the other hand, I would strongly discourage 'if(x)' where x is a float that may be NaN purely because the 'correct' behaviour here isn't clear to me.

amelius4y ago

How about the case where x is (y > 0)? If y is NaN, shouldn't x be boolean-NaN? And shouldn't if(x) throw an exception? Or shouldn't (y > 0) throw an exception if you don't want boolean-NaNs?

xen04y ago

That's easy: y > 0 is False, not NaN.

You may not think this is wise, but this is very much how comparisons with NaN are defined.

And I think this is better than exception raising. Again, I think it would be _really_ weird for simple value comparisons to throw.

1 more reply

ElevenLathe4y ago

The compiler presumably can't know in most cases, but the runtime might be able to throw. It depends on the language implementation and the tradeoffs.

mrlonglong4y ago

PopePompus4y ago

I love NaNs, especially their "infectious" quality. Initializing float variable to NaNs before first assignment can make a lot of errors immediately obvious. I wish there were a NaN for integers.

colejohnson664y ago

What about a “nullable” double? In C#, you’d use `double?`, Rust would be Option<f64>, C++ would be std::optional<double>. Then any operation would throw upon an unset value?

olliej4y ago

That would required every operation on a floating point value to return an optional, which you’d then need to unwrap and branch on.

saagarjha4y ago

Don’t initialize them and turn on UBSan :)

j / k navigate · click thread line to collapse