Why do so many (all?) textual data serialization formats represent floats in base-10 scientific notation, anyway?
If we wanted floats that are 1. human-editable but 2. bijective with IEEE754, wouldn't floating-point hexadecimal (and "e" notation representing a base-2 exponent) be a better idea?
I mean, depends on the human. Most don't know hexadecimal, but know what 3.14 means.
The real issue is why do so many float parsers and printers fail to do exact round tripping? Designing a good algorithm for this was a bit difficult, but these days this is a solved problem.
That combined with almost zero package management for retrieving things that were solved decades ago means we keep coming into this issue, partially because of the mindset of C programmers.
If you are serious about your data format supporting round tripping you can and should specify the precise ASCII encoding of binary floats and the inverse. If that means implementations have to ship their own float formatter and parser than so be it - no one is tied to whatever comes with their libc, package manager or no package manager.
But it isn't just about undefined behavior, it's more about the culture of C and how it approaches package management and sharing (or in this case, doesn't). Even if C has Rust level correctness checking it would have the same issue.
>If you are serious about your data format supporting round tripping you can and should specify the precise ASCII encoding of binary floats and the inverse.
Well I guess we have our answer in the case of seriousness. I'm guessing it didn't matter enough for the implementers, or it did matter but could never actually get it implemented. The reasons for this are numerous, contextual (with context we'll never have), and probably not rooted in technical reasoning.
We are talking about the domain of animation and games, after all. Not mission critical code. There's more wiggle room, especially for the complexity of media around when the format was being developed.
but then you loose the human readability / "understand-ability at a glance" advantage, so it sort of depends what the use-case is...
So 5e3 is a float; 3/8 is a float; and 5e3+3/8 is a float. Each cleanly and exactly representing particular IEEE754 values, while also being readable as a base-10 polynomials.
Maybe fractions of arithmetically-specified powers of two could also be allowed, for really big denominators. 3/2**26, for example.
Having to do any form of interpretation (even scientific notation is not ideal in some cases), is not great for many users.
Most OOP languages have a "debug print" or "shell inspect" method that the programmer can override, where by default the method will print something that's valid language syntax, but where the overrides aren't required or expected to be such, and instead should concisely describe the object at the expense of being reloadable. These same languages usually also have support for custom serializers for text-based serialization formats like JSON. The serializer implementation for JSON, and the serializer implementation for "shell inspect", are rarely identical.
I think what a CG/VFX artist would want here, isn't that the canonical textual file-format "for import" gives them decimal-serialized floats; but rather that they have the option to "inspect" the project, resulting in a view that looks like e.g. https://www.tonymacx86.com/media/ioregistryexplorer.187440/f... — an hierarchically-expandable "shell inspect" of the project. It makes perfect sense for the floats in such a read-only debugging-oriented view to be rendered in decimal (esp. if a raw canonical binary-data representation is given in parentheses beside the rendered value.)
The conversions are not even that hard ... unless you want to deal with arbitrary (and arbitrarily long) decimal representations and not just those that arise from IEEE numbers. Essentially the only choice to make is whether the conversion to decimal will emit all the digits all the time (simpler) or the shortest number of digits that will round to the requested IEEE float when read back (less liable to be mocked in webcomics[1]).
Of course, using hex floats is much simpler than even the simplest implementation of the above; I just want to point out that IEEE floats are perfectly roundtrippable through decimal.