undefined | Better HN

0 pointsderefr2y ago0 comments

> proper round-tripping of float values

Why do so many (all?) textual data serialization formats represent floats in base-10 scientific notation, anyway?

If we wanted floats that are 1. human-editable but 2. bijective with IEEE754, wouldn't floating-point hexadecimal (and "e" notation representing a base-2 exponent) be a better idea?

0 comments

andyferris2y ago

> human-editable

I mean, depends on the human. Most don't know hexadecimal, but know what 3.14 means.

The real issue is why do so many float parsers and printers fail to do exact round tripping? Designing a good algorithm for this was a bit difficult, but these days this is a solved problem.

johnnyanmac2y ago

If I had to take a slightly snide guess: because these are low level tools, so there's a 90% chance that these parsers/prints are written in C, or ultimately depend on C implementations. As any C programmer would know, C loves to throw "undefined behavior" at any problem it doesn't bother to document. Which is a lot.

That combined with almost zero package management for retrieving things that were solved decades ago means we keep coming into this issue, partially because of the mindset of C programmers.

account422y ago

This is just "hur dur C undefined lol" level of a comment.

If you are serious about your data format supporting round tripping you can and should specify the precise ASCII encoding of binary floats and the inverse. If that means implementations have to ship their own float formatter and parser than so be it - no one is tied to whatever comes with their libc, package manager or no package manager.

johnnyanmac2y ago

I did say slightly snide.

But it isn't just about undefined behavior, it's more about the culture of C and how it approaches package management and sharing (or in this case, doesn't). Even if C has Rust level correctness checking it would have the same issue.

>If you are serious about your data format supporting round tripping you can and should specify the precise ASCII encoding of binary floats and the inverse.

Well I guess we have our answer in the case of seriousness. I'm guessing it didn't matter enough for the implementers, or it did matter but could never actually get it implemented. The reasons for this are numerous, contextual (with context we'll never have), and probably not rooted in technical reasoning.

We are talking about the domain of animation and games, after all. Not mission critical code. There's more wiggle room, especially for the complexity of media around when the format was being developed.

pixelesque2y ago

Some do (i.e. Nuke compositor .nk files in the VFX industry, which stores some float values as hex)...

but then you loose the human readability / "understand-ability at a glance" advantage, so it sort of depends what the use-case is...

derefrOP2y ago

The best of both worlds, at least in my opinion, would be to write a float as a polynomial in two parts (where either part alone is still a float): an integer part with an optional scientific-notation exponent; and a fractional part, where the fraction's denominator is always a power of two.

So 5e3 is a float; 3/8 is a float; and 5e3+3/8 is a float. Each cleanly and exactly representing particular IEEE754 values, while also being readable as a base-10 polynomials.

Maybe fractions of arithmetically-specified powers of two could also be allowed, for really big denominators. 3/2**26, for example.

pixelesque2y ago

That's not particularly user-friendly though: at least for CG/VFX software (where USD came from and is designed for), non-technical (at least in terms of understanding IEEE floats) people like artists often want to look at the values to verify stuff for 'debugging' (i.e. is the software tool I'm using actually exporting the correct values I selected in the UI params panel).

Having to do any form of interpretation (even scientific notation is not ideal in some cases), is not great for many users.

derefrOP2y ago

This would seem to point to a fundamental impedance mismatch between textual dump formats used as debugging aids; vs textual project file formats which are human-readable for the purpose of permitting the use of text-based tools to process low-level data structures of the project before it's loaded back in.

Most OOP languages have a "debug print" or "shell inspect" method that the programmer can override, where by default the method will print something that's valid language syntax, but where the overrides aren't required or expected to be such, and instead should concisely describe the object at the expense of being reloadable. These same languages usually also have support for custom serializers for text-based serialization formats like JSON. The serializer implementation for JSON, and the serializer implementation for "shell inspect", are rarely identical.

I think what a CG/VFX artist would want here, isn't that the canonical textual file-format "for import" gives them decimal-serialized floats; but rather that they have the option to "inspect" the project, resulting in a view that looks like e.g. https://www.tonymacx86.com/media/ioregistryexplorer.187440/f... — an hierarchically-expandable "shell inspect" of the project. It makes perfect sense for the floats in such a read-only debugging-oriented view to be rendered in decimal (esp. if a raw canonical binary-data representation is given in parentheses beside the rendered value.)

Daub2y ago

I love that a Nuke file can (usually) be passed around using no more than copy/pasted text.

mananaysiempre2y ago

FWIW, well-implemented round-to-nearest conversion routines (e.g. Python and IIRC Glibc, although MSVC is historically bad about this) will roundtrip IEEE>decimal>IEEE if you use the correct number of digits (at least 9 for singles and 17 for doubles), for reasons of mathematics and not implementation. (The other way around also works, barring exponent under- and overflow, but for at most 6 and 15 digits respectively, so I wouldn’t call bijective, strictly speaking.)

The conversions are not even that hard ... unless you want to deal with arbitrary (and arbitrarily long) decimal representations and not just those that arise from IEEE numbers. Essentially the only choice to make is whether the conversion to decimal will emit all the digits all the time (simpler) or the shortest number of digits that will round to the requested IEEE float when read back (less liable to be mocked in webcomics[1]).

Of course, using hex floats is much simpler than even the simplest implementation of the above; I just want to point out that IEEE floats are perfectly roundtrippable through decimal.

[1] https://www.smbc-comics.com/comic/2013-06-05

j / k navigate · click thread line to collapse

0 comments

andyferris2y ago

> human-editable

I mean, depends on the human. Most don't know hexadecimal, but know what 3.14 means.

The real issue is why do so many float parsers and printers fail to do exact round tripping? Designing a good algorithm for this was a bit difficult, but these days this is a solved problem.

johnnyanmac2y ago

That combined with almost zero package management for retrieving things that were solved decades ago means we keep coming into this issue, partially because of the mindset of C programmers.

account422y ago

This is just "hur dur C undefined lol" level of a comment.

johnnyanmac2y ago

I did say slightly snide.

>If you are serious about your data format supporting round tripping you can and should specify the precise ASCII encoding of binary floats and the inverse.

pixelesque2y ago

Some do (i.e. Nuke compositor .nk files in the VFX industry, which stores some float values as hex)...

but then you loose the human readability / "understand-ability at a glance" advantage, so it sort of depends what the use-case is...

derefrOP2y ago

So 5e3 is a float; 3/8 is a float; and 5e3+3/8 is a float. Each cleanly and exactly representing particular IEEE754 values, while also being readable as a base-10 polynomials.

Maybe fractions of arithmetically-specified powers of two could also be allowed, for really big denominators. 3/2**26, for example.

pixelesque2y ago

Having to do any form of interpretation (even scientific notation is not ideal in some cases), is not great for many users.

derefrOP2y ago

Daub2y ago

I love that a Nuke file can (usually) be passed around using no more than copy/pasted text.

mananaysiempre2y ago

Of course, using hex floats is much simpler than even the simplest implementation of the above; I just want to point out that IEEE floats are perfectly roundtrippable through decimal.

[1] https://www.smbc-comics.com/comic/2013-06-05

j / k navigate · click thread line to collapse