What even is a JSON number? (opens in new tab)

(blog.trl.sn)

179 pointsbterlson2y ago147 comments

147 comments

I'll add that for Haskell, the library everyone uses for JSON parses numbers into Scientific types with almost unlimited size and precision. I say almost unlimited because they use a decimal coefficient-and-exponent representation where the exponent is a 64-bit integer.

The documentation is quite paranoid that if you are dealing with untrusted inputs, you could parse two JSON numbers from the untrusted source fine and then performing an addition on them could cause your memory to fill up. Exciting new DoS vector.

Of course in practice people end up parsing them into custom types with 64-bit integers, so this is only a problem if you are manipulating JSON directly which is very rare in Haskell.

akubera2y ago

I was attempting to solve this very problem in the Rust BigDecimal crate this weekend. Is it better to just let it crash with an out of memory error, or have a compile-time constant limit (I was thinking ~8 billion digits) and panic if any operation would exceed that limit with a more specific error-message (does that mean it's no longer arbitrary-precision?). Or keep some kind of overflow-state/nan, but then the complexity is shifted into checking for NaNs, which I've been trying to avoid.

Sounds like Haskell made the right call: put warnings in the docs and steer the user in the right direction. Keeps implementation simple and users in control.

To the point of the article, serde_json support is improving in the next version of BigDecimal, so you'll be able to decorate your BigDecimal fields and it'll parse numeric fields from the JSON source, rather than json -> f64 -> BigDecimal.

    #[derive(Serialize, Deserialize)]
    pub struct MyStruct {
      #[serde(with = "bigdecimal::serde::json_num")]
      value: BigDecimal,
    }

Whether or not this is a good idea is debatable[^], but it's certainly something people have been asking for.

[^] Is every part of your system, or your users' systems, going to parse with full precision?

marcosdumay2y ago

It's best if your parser fails.

Serde has an interface that allows failing. That one should fail. There is also another that panics, and AFAIK it will automatically panic on any parser that fails.

Do not try to handle huge values, do not pretend your parser is total, and do not pretend it's a correct value.

If you want to create an specialized parser that handles huge numbers, that's great. But any general one must fail on them.

1 more reply

AaronFriel2y ago

I'd strongly recommend against this default - it's a major blocker for using the Haskell library with web APIs as it transforms JSON RPC into into readily available denial of service attacks.

8 billion digits (~100 bits?) is far more than should be used.

Would it possible to use const generics to expose a `BigDecimal<N>` or `BigDecimal<MinExp, MaxExp, Precision>` type with bounded precision for serde, and disallow this unsafe `BigDecimal` entirely?

If not, I expect BigDecimal will be flagged in a CVE in the near future for causing a denial of service.

1 more reply

kccqzy2y ago

I think Haskell's warning-in-the-doc approach is not strong enough. I'd be in favor of distinguishing small and huge values using the type system. Have a Rust enum that contains either a small-ish number (the absolute value being 10^100 or less, but the threshold should be configurable preferably as a type parameter) or a huge number. Then the user will be required to handle it. Most of the time the user does not want huge numbers, so they will fail the parse explicitly when they do a match and find it.

1 more reply

CJefferson2y ago

I don't think there is any "sensible limit" which is big enough for everyone's needs, but low enough you won't blow out memory.

An 8 billion digit number is 2.5G? (Did I do my maths right?) All I need to do is shove 1,000 of those in a JSON array, and I'll cause an out-of-memory anyway.

On the other hand, any limit low enough that I can't blow up memory by making an array of 100K or so is going to be too low for some people (including me, I often make numbers of low-million numbers of digits).

Providing some method of putting a limit on seems sensible, but maybe just make a LimitedBigDecimal type, so then through the whole program there is a limit on how much memory BigDecimals can take up? (I haven't looked at the library in detail, sorry).

im3w1l2y ago

If I understand the situation correctly, in Haskell an unbounded number is the default that you get if you do something similar to JSON.parse(mystr). That means you can have issues basically anywhere. Whereas in Rust with Serde you would only get an unbounded number if you explicitly ask for one. That's a pretty major difference. Only a small number of places will explicitly ask for BigDecimal, and in those cases they probably want an actual unbounded number. And they should be prepared to deal with the consequences of that.

My 2cent anyway.

1 more reply

adolph2y ago

How does it handle exponent notation?

https://news.ycombinator.com/item?id=36027871

kccqzy2y ago

It just handles it natively. The internal representation is coefficient and exponent. Parsing `1e100` results in storing 1 and 100 separately. That's why parsing huge JSON numbers is not a problem. The problem comes when you do arithmetic on it, which is when it needs to convert the number into the libgmp representation.

hinkley2y ago

One of the first Ajax projects I worked on was multi tenant, and someone decided to solve the industrial espionage problem by using random 64 bit identifiers for all records in the system. You have about a .1% chance of generating an ID that gets truncated in JavaScript, which is just enough that you might make it past MVP before anyone figures out it’s broken, and that’s exactly what happened to us.

So we had to go through all the code adding quotes to all the ID fields. That was a giant pain in my ass.

golergka2y ago

I've been burned by a similar issue too. Lesson here is never to use numbers for things you are not planning to do math on. Ids should always be strings.

justinpombrio2y ago

Isn't the lesson only that ids shouldn't be floats? If they were integers everything would be fine, but JS numbers aren't integers, even if they look like them sometimes.

1 more reply

greggyb2y ago

Until you want faster joins, in which case, comparisons of integers tend to be much faster on hardware I am aware of than string comparisons.

3 more replies

Recursing2y ago

> You have about a .1% chance of generating an ID that gets truncated in JavaScript

I don't follow. 1-(Number.MAX_SAFE_INTEGER / 2*63) ~ 99.9%, so don't you have a >99% chance of generating an ID that gets truncated in js?

js22y ago

IEEE 754 can represent integers larger than MAX_SAFE_INTEGER, just not all of them:

https://en.wikipedia.org/wiki/Double-precision_floating-poin...

That's still going to be a greater than 0.1% chance of hitting a non-representable value though.

1 more reply

secondcoming2y ago

Why would the value get truncated?

hatthew2y ago

FTA: JavaScript's built-in JSON implementation is limited to the range and precision of a double.

Obviously, not all int64 values are representable in float64 (double).

1 more reply

NegativeLatency2y ago

not all numbers are representable as the particular type of floating point number that js uses

nice pics here: https://en.wikipedia.org/wiki/Floating-point_arithmetic

yawaramin2y ago

Long story short: don't use JSON numbers to represent money or monetary rates. Always use decimals encoded as string. It's surprising how many APIs fall short of this basic bar.

magicalhippo2y ago

We've used XML for interchange of order-like data. Customers have started demanding JSON, so I built a tool to generate XML <-> JSON converters, along with JSON Schema file, based on an XSD, so we could continue to use our existing XML infrastructure on the inside.

I must admit I totally forgot about the JSON number issue. Our files include fields for various monetary amounts and similar, and in XML we just used "xs:decimal".

Most will be less than a million and requires less than four decimal digits. But I guess someone might decide to use it for the equivalent of a space station or similar, ya never know...

incorrecthorse2y ago

No. Use integers to store the smallest money decimal, and store the currency name alongside.

sjducb2y ago

What happens if you’re sure that four decimal places is the smallest, then suddenly a partner system starts sending you 6 decimal places?

1 more reply

mrloba2y ago

Then you should store the time as well, because the number of decimals in a currency can change (see ISK). Also, some systems disagree on the number of decimals, so be careful. And of course prices can have more decimals. And then you have cryptocurrencies, so make sure you use bigints

yawaramin2y ago

You store it as an integer, but as we just saw in the OP, for general interop with any system that parses JSON you have to assume that it will be parsed as a double. So to avoid precision loss you are going to have to store it as a string anyway. At that point it's upto you whether you want to reinvent the wheel and implement all the required arithmetic operations for your new fixed-point type. Or you could just use the existing decimal type that ships on almost every mature platform: Java, C#, Python, Ruby, etc.

1 more reply

jillesvangurp2y ago

Depends on the language. On the JVM you are fine. With Javascript, doing math on big numbers is probably going to end in tears unless you know what you are doing. Either way, have some tests for this and make sure your code is doing what you expect.

Encoding numbers as string because you are using a language and parser that can't deal with numbers properly (even 64 bit doubles), is a bit of a hack. Basically the rest of the world giving up because Javascript can't get its shit together is not a great plan.

wruza2y ago

Accounting for the lowest common denominator that has a huge share in it is always a great plan. Every trading platform out there uses "+-ddd.ddd" format, even binary-born protocols completely unrelated to js used it since forever.

kag02y ago

Yeah I disagree.

As the article said

> RFC 8259 raises the important point that ultimately implementations decide what a JSON number is.

Any implementation dealing with money or monetary rates should know that it needs to deal with precision and act accordingly. If you want to use JavaScript to work with money, you need to get a library that allows you to represent high precision numbers. It's not unreasonable to also expect that you get a JSON parsing library that supports the same.

oh, TIL that you can support large numbers with the default JavaScript JSON library https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

yawaramin2y ago

The only problem with this attitude is that JSON APIs are meant to be interoperable, and as the OP showed, you can't rely on the systems you interoperate with to uniformly have the same understanding of JSON numbers, and misinterpreting numbers because of system incompatibilities will cause some really bad headaches that are totally avoidable by just forcing everyone to interop in terms of decimal numbers encoded as strings.

whalesalad2y ago

I tend to end up encoding everything as an integer (multiply by 1000, 10000 etc) and then turn it back into a float/decimal on decode. For instance if I am building a system dealing with dollar amounts I will store cent amounts everywhere, communicate cent amounts over the wire, etc. then treat it as a presentation concern to render it as a dollar amount.

jerf2y ago

It's worth bearing in mind when you do that that the largest integer that is "generally safe" in JSON is 2^53-1, so if you scale by a factor of 10000 you're taking 13-14 more bits off that maximum. That leaves you about 2^40, or about a trillion, before you may start losing precision or seeing systems disagree about the decoded values. Whether that's a problem depends on your domain.

berkes2y ago

For money, that's a sane setup.

But do note, that in currency, there are multiple, actively used currencies that have zero, three, five (rare) or even eight (BTC) decimals. That some decimals cannot be divided by all numbers (e.g. only 0.5)

Point being: floats are dangerously naive for currency. But integers are naive too. You'll most probably want a "currency" or "money" type. Some Value Object, or even Domain Model.

XML offered all this, but in JSON there's little to convey this, other than some nested "object" with at least the decimal amt (as int), and the ISO4217 currency. And maybe -depending on how HATEOAS you wanna be- a formatting string to be used in locales, a rule on divisibility and/or how many decimal places your int or decimal might be.

(FWIW, I built backbends for financial systems and apps. It gets worse than this if you do math on the currencies. Some legislatioins or bookkeeping rules state that calculation uses more or less decimals. E.g. that ($10/3)*3 == $10 vs == $9.99. or that $0.03/2 == 0.1 + 0.2, e.g. when splitting a bill. This stuff is complex, but real domain logic)

berkes2y ago

When I say dangerously naive, I mean in a way that people can go to jail¹ for "loosing" or "inventing" cents. Which your software will do if you use floats.

¹IANAL. But this was told when legal people looked at our architecture.

1 more reply

hn_throwaway_992y ago

The problem with that (which I have seen in practice) is that you are essentially hard coding the maximum precision you will accept for every client that needs to interpret your JSON.

For example, you say you store monetary amounts as cents. What if you needed to store US gas prices, which are normally priced in amounts ending in 9/10ths of a cent? If you want to keep your values as integers you need to change your precision, which will likely mess up a lot of your code.

felixfbecker2y ago

and different currencies have different default precisions. So if you're dealing with multiple currencies, now you need both client and server to have a map of all currency precisions for formatting purposes that they agree on.

What's worse is that these things can also change over time and there is sometimes disagreement over what the canonical value is.

E.g. ISO 4217 (used by Safari, Firefox and NodeJS) will say that the Indonesian Rupiah (IDR) uses 2 decimal digits, while Unicode CLDR (used by Chrome) will say that they use 0 decimal digits. The former is the more "legalistic" definition, while the latter matches how people use the currency in reality.

This is not a real issue if you transfer amounts as decimal strings and then pass those to the Intl API for formatting (the formatting will just be different but still correct), but it's catastrophic if you use scaled-up integers (all amounts will be off by magnitudes).

For this reason I would always store currency amounts in an appropriate DECIMAL type in the DB and send currency amounts as strings over the wire.

NateEag2y ago

This is a good point.

It's not widely known, but US gasoline prices are actually in a defined currency unit, the mill (https://en.m.wikipedia.org/wiki/Mill_(currency)).

For most purposes, using mills as the base unit would be sufficient resolution.

nedt2y ago

So basically you use fixpoint numbers. Especially for currency that’s a very good idea anyway, because of rounding errors, even more so in IEEE 754

bterlsonOP2y ago

Pedantically, IEEE 754 defines decimal floating point formats (like decimal128) which are appropriate for representing currency. Representing currency in non-integer values in any of the binary floating point formats is indeed a recipe for disaster though.

Supermancho2y ago

I have tried to encode all non-trivial numbers as strings. If it's too big (or small), or if it's a float, I'll have to change my JSON schema. Bake the need to decode numbers into the transforms for consistency.

eddd-ddde2y ago

This is great as long as you always make clear which value is pre post encoding. I remember one of my first production bugs was giving users 100 times the credit they actually bought. Oops.

01HNNWZ0MV43FF2y ago

Makes sense for dollars, but for anything like graphics or physics I'd consider a power of two like 1,024 as the fixed-point factor instead.

My intuition tells me that "x * 1000 / 1000 == x" might not be true for all numbers if you're using floats.

beryilma2y ago

A sure sign of an inexperienced programmer in numerical computing is when they check for equality to zero of a floating-point number as

if (x == 0) ...

instead of something like

if (abs(x) < eps) ...

where eps is a suitably defined small number.

3 more replies

hobs2y ago

I often store it as smaller than cents, because anything with division or a basket of summed parts with taxes can start to get funky if you round down (and some places have laws about that.)

vbezhenar2y ago

My opinion is that a safe approach is to use either 52-bit integer number or 64-bit floating number to keep JavaScript compatibility. JavaScript is too important and at the same time, the errors are too terrific (JS will silently round to the nearest 52-bit integer number which could lead to various exploits) to skip on that. If you need anything else, just use strings.

EdSchouten2y ago

I think the description for Go is inaccurate/incomplete. You can call this function to instruct the decoder to leave numbers in unparsed string form:

https://pkg.go.dev/encoding/json#Decoder.UseNumber

That allows you to capture/forward numbers without any loss of precision.

bterlsonOP2y ago

I have added this note, thanks! In the blog I am mostly trying to show the behavior you get using the (maybe defacto) stdlib with its default configuration, but this is useful data to call out.

js22y ago

If you're going to extend Go the courtesy of customizing the parser, oughtn't you do the same for Python (and all the languages)?

To wit, Python's json module has `parse_float` and `parse_int` hooks:

https://docs.python.org/3/library/json.html#encoders-and-dec...

Example:

  >>> json.loads('{"int":12345,"float":123.45}', parse_int=str, parse_float=str)
  {'int': '12345', 'float': '123.45'}

FWIW, when I've cared about interop and controlled the schema, I've specified JSON strings for numbers, along with the range, precision, and representation. This is no worse (nor better) than using RFC 3339 for dates.

1 more reply

bevekspldnw2y ago

Try to get a DECIMAL value out of a Postgres database into a JSON API response and you’ll learn all this and more in the most painful way possible!

Someone2y ago

Other values one could test for:

- “+1” (not a valid number, according to ECMA-404 and RFC-8259)

- “+0” (also not a valid number, but trickier than “+1” because IEEE floats have “+0” and “-0”)

- “070” (not a valid number, but may get parsed as octal 56)

- “1.” (not a valid number in json)

- “.1” (not a valid number in json)

- “0E-0” (a valid number in json)

There probably are others.

paulddraper2y ago

> I-JSON messages SHOULD NOT include numbers that express greater magnitude or precision than an IEEE 754 double precision number provides

I'm confused by this.

What is the precision of 0.1, relative to IEEE 754?

If I read it correctly, that statement is saying:

  json_number_precision(json_number) <= ieee_754_precision

^ How do I calculate these values?

bterlsonOP2y ago

I think the spec just means, assume IEEE 754. In the case of 0.1, which cannot be represented exactly, software should assume that `0.1` will be represented as `0.100000000000000005551115123126`. Depending on `0.1` being parsed as the exact value `0.1` is not widely interoperable.

ebolyen2y ago

Relatedly, what about integers like 9007199254740995. Is that a legal integer since it rounds to 9007199254740996?

It does seem unclear what it means to exceed precision (given rounding is such an expected part of the way we use these numbers). Magnitude feels easier as at least you definitely run out of bits in the exponent.

hugh-avherald2y ago

I think the spec is saying that it is the message that should not express greater magnitude or precision, not 'the number'.

So including the string "0.1" in a message is fine because v = 0.1 implies 0.05 < v < 0.15, but including 0.100000000000000000000000000000000000 would not be.

Cloudef2y ago

First thing that I check before using a JSON parser library is that if it lets me to get the number as a string and let me do my own conversion. Libraries that try to treat the number as double or bring in a large bigint/decimal library gets usually pass from me.

speedgoose2y ago

If you need a specific exotic JSON parser to parse the numbers you have correctly, I would argue that you should serialise them as strings and not as numbers.

That's what prometheus is doing for example. https://prometheus.io/docs/prometheus/latest/querying/api/

Cloudef2y ago

That only works if you are the one who serializes the json in the first place.

nigeltao2y ago

When I wrote my jsonptr tool a few years ago, I noticed that some JSON libraries (in both C++ and Rust) don't even do "parse a string of decimal digits as a float64" properly. I don't mean that in the "0.3 isn't exactly representable; have 0.30000000000000004 instead" sense.

I mean that rapidjson (C++) parsed the string "0.99999999999999999" as the number 1.0000000000000003. Apart from just looking weird, it's a different float64 bit-pattern: 0x3FF0000000000000 vs 0x3FF0000000000001.

Similarly, serde-json (Rust) parsed "122.416294033786585" as 122.4162940337866. This isn't as obvious a difference, but the bit-patterns differ by one: 0x405E9AA48FBB2888 vs 0x405E9AA48FBB2889. Serde-json does have an "float_roundtrip" feature flag, but it's opt-in, not enabled by default.

For details, look for "rapidjson issue #1773" and "serde_json issue #707" at https://nigeltao.github.io/blog/2020/jsonptr.html

01HNNWZ0MV43FF2y ago

Oh wow. So serde_json doesn't roundtrip floats by default, it uses some imprecise faster algorithm https://github.com/serde-rs/json/issues/707

Good thing there's msgpack I guess.

lofenfew2y ago

this requires multiple precision to do properly and isn't useful most of the time. its odd to describe this as "not properly". you might say "with exact rounding", but that makes it clearer that this isn't that useful a feature, especially since we usually expect floats to be inexact in the first place.

int_19h2y ago

With JSON, there's essentially no such thing as "properly" when it comes to parsing numbers, since the spec doesn't limit the ability of the implementation to constrain width and precision. It only says that float64 is common and therefore "good interoperability can be achieved by implementations that expect no more precision or range than these provide", but note the complete absence of any guarantees in that wording.

The only sane thing with JSON is to avoid numbers altogether and just use decimal-encoded strings. This forces the person parsing it on the other end to at least look up the actual limits defined by your schema.

Dylan168072y ago

Rounding by more than an ULP is pretty bad. I don't think it's odd at all to describe rapidjson's behavior as improper.

At least 122.416294033786585 is between ...888 and ...889, though it's much closer to the former.

egwor2y ago

I think the thing folk miss is when there’s an error like divide by zero, or the calculation would return NaN. I feel like this is the main gap/concern with using JSON and it seems to be rarely discussed.

olejorgenb2y ago

Agreed, this can be a pain. Python by default serialize and de-serialize the `NaN` literal, making you pay some cleanup cost once you need to interopt with other systems. (same for `Inf`)

Say what you want about NaN, but IEEE 754 is the facto way of dealing with floating points in computers and even if NaNs and Infs are a bit "fringe" it's unfortunate that the most popular serialization format can not represent these.

int_19h2y ago

There are so many things that are poorly thought out or underspecified in JSON, it's amazing that it got so widely adopted for interop. No wonder that it became a perpetual source of serialization bugs.

1 more reply

eternityforest2y ago

I like to think of floating point values as noisy analog voltages, with the extra propery that they can store small integers perfectly, and they can be copied within code but not round trip serialized and deserializer without noise.

They're not really noisy, but if an application would work with some random noise added, it will probably work with floats, and if it wouldn't work with noise added, it's probably easier to just not use floats and expext people to reason about IEEE details, while risking subtle bugs if different float representations get mixed.

Of course I'm not doing a lot of high performance algorithms, I would imagine in some applications you really do need to reason about floats.

j16sdiz2y ago

The font choice for inline text is so distracting

This "Averia Serif Libre" is unreadable for me.

ctrw2y ago

I still get a laugh of ecma 404. The first time I looked it up I refreshed the page a large number of times before I realized it wasn't an error.

p0w3n3d2y ago

I think that good decision regarding numbers in API (as it was made in my project) is to put meaningful decimal numbers into string and let them be handled by exact decimal calculation framework, e.g. BigDecimal in Java etc.

Tabular-Iceberg2y ago

Does anyone have any idea why Crockford decided that at least one digit is required after the decimal point, as opposed to JavaScript which has zero or more?

erik_seaberg2y ago

It's weird that any parser that loses digits is tolerated. A parser that forces strings into uppercase US-ASCII never would be.

int_19h2y ago

It's tolerated because the JSON spec explicitly allows it:

   This specification allows implementations to set limits on the range
   and precision of numbers accepted.  Since software that implements
   IEEE 754 binary64 (double precision) numbers [IEEE754] is generally
   available and widely used, good interoperability can be achieved by
   implementations that expect no more precision or range than these
   provide, in the sense that implementations will approximate JSON
   numbers within the expected precision.  A JSON number such as 1E400
   or 3.141592653589793238462643383279 may indicate potential
   interoperability problems, since it suggests that the software that
   created it expects receiving software to have greater capabilities
   for numeric magnitude and precision than is widely available.

   Note that when such software is used, numbers that are integers and
   are in the range [-(2**53)+1, (2**53)-1] are interoperable in the
   sense that implementations will agree exactly on their numeric
   values.

And yes, this is completely insane for a format that supposed to be specifically for serialization and interop. Needless to say, the industry has enthusiastically adopted it to the point where it became the standard.

I miss XML these days. Sure, it was verbose and had a bunch of different and probably excessive numeric types defined for XML Schema... but at least they were well-defined (https://www.w3.org/TR/xmlschema-2/#built-in-datatypes). And, on the other hand, without a schema, all you had were strings. Either way, no mismatched expectations.

msm_2y ago

That's true for every floating point number in every programming language you have ever used, though.

    $ python3
    Python 3.10.13 (main, Aug 24 2023, 12:59:26) [GCC 12.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 100000.000000000017
    100000.00000000001

Izkata2y ago

This is why Decimal exists:

  Python 3.8.10 (default, Nov 22 2023, 10:22:35) 
  [GCC 9.4.0] on linux
  Type "help", "copyright", "credits" or "license" for more information.
  >>> from decimal import Decimal
  >>> Decimal('100000.000000000017')
  Decimal('100000.000000000017')

For example:

  >>> import json
  >>> json.loads('{"a": 100000.000000000017}')
  {'a': 100000.00000000001}
  >>> json.loads('{"a": 100000.000000000017}', parse_float=Decimal)
  {'a': Decimal('100000.000000000017')}

3 more replies

enriquto2y ago

> That's true for every floating point number in every programming language you have ever used, though.

Alright, if "you" have only ever used python. In C, for example, we have hexadecimal floating point literals that represent all floats and doubles exactly (including infinities and nans that make the json parser fail miserably).

3 more replies

tpm2y ago

https://0.30000000000000004.com/

Although it would be good to move in the direction of using a BigDecimal equivalent by default when ingesting unknown data.

yau8edq12i2y ago

JSON is a notation. It's syntax. The semantics are left up to the implementation. The question has no answer.

datagreed2y ago

That font >.<

p0w3n3d2y ago

It does something to my eyes

frizlab2y ago

It’s missing Swift tests, but otherwise it’s a great post.

bterlsonOP2y ago

If you would like to contribute Swift tests, I would be happy to take it! You can send a PR into this document, updating the data tables and adding a code sample at the end: https://github.com/bterlson/blog/blob/main/content/blog/what.... No need to test openapi-tools swift codegen unless you really want to!

frizlab2y ago

I’m having a lot on my plate currently, but I’m adding this to my TODO list!

ape42y ago

Since JSON is so widely used it should be modified to support more types - Mongo DB's Extended JSON supports all the BSON (Binary) types:

    Array
    Binary
    Date
    Decimal128
    Document
    Double
    Int32
    Int64
    MaxKey
    MinKey
    ObjectId
    Regular Expression
    Timestamp

https://www.mongodb.com/docs/manual/reference/mongodb-extend...

bterlsonOP2y ago

JS is likely to get a hook to be able to handle serialization/deserialization of such values without swapping out the entire implementation[1]. Native support for these types, without additional code or configuration, would likely break the Internet badly, so is unlikely to happen unfortunately.

1: https://github.com/tc39/proposal-json-parse-with-source

apantel2y ago

Much more valuable than any such extension would be a way to annotate types and byte lengths of keys and values so that parsers could work more efficiently. I’ve spent a lot of time making a fast JSON parser in Java and the thing that makes it so hard is you don’t know how many bytes anything is, or what type. It’s hard to do better than naive byte-by-byte parsing.

your_fin2y ago

If you control the underlying data, I must reccomend Amazon Ion! Its text format is a strict superset of JSON, but they also maintain binary format that will round-trip data and is designed for efficient scanning. There's even prefixed annotations if you want them :)

It also specs proper decimal values, mitigating the issues presented in the OP.

https://amazon-ion.github.io/ion-docs/

Dylan168072y ago

JSON is not the place to be so fussy about number widths, and things like MaxKey and 24-hex-value ObjectId would be ridiculous.

zilti2y ago

Or maaaybe use XML for such cases

mulmen2y ago

There are no numbers in JSON. There are only strings.

billpg2y ago

"ID numbers start from 2^53 and are allocated sequentially including odd numbers that are not compatible with "double" types. Please ensure you are reading this value as a 64-bit integer."

timvdalen2y ago

A little off topic, but fun to see that someone else has adopted that magical CSS theme! (https://css.winterveil.net/)

1 more reply

j / k navigate · click thread line to collapse

147 comments

kccqzy2y ago

Of course in practice people end up parsing them into custom types with 64-bit integers, so this is only a problem if you are manipulating JSON directly which is very rare in Haskell.

akubera2y ago

Sounds like Haskell made the right call: put warnings in the docs and steer the user in the right direction. Keeps implementation simple and users in control.

    #[derive(Serialize, Deserialize)]
    pub struct MyStruct {
      #[serde(with = "bigdecimal::serde::json_num")]
      value: BigDecimal,
    }

Whether or not this is a good idea is debatable[^], but it's certainly something people have been asking for.

[^] Is every part of your system, or your users' systems, going to parse with full precision?

marcosdumay2y ago

It's best if your parser fails.

Serde has an interface that allows failing. That one should fail. There is also another that panics, and AFAIK it will automatically panic on any parser that fails.

Do not try to handle huge values, do not pretend your parser is total, and do not pretend it's a correct value.

If you want to create an specialized parser that handles huge numbers, that's great. But any general one must fail on them.

1 more reply

AaronFriel2y ago

I'd strongly recommend against this default - it's a major blocker for using the Haskell library with web APIs as it transforms JSON RPC into into readily available denial of service attacks.

8 billion digits (~100 bits?) is far more than should be used.

Would it possible to use const generics to expose a `BigDecimal<N>` or `BigDecimal<MinExp, MaxExp, Precision>` type with bounded precision for serde, and disallow this unsafe `BigDecimal` entirely?

If not, I expect BigDecimal will be flagged in a CVE in the near future for causing a denial of service.

1 more reply

kccqzy2y ago

1 more reply

CJefferson2y ago

I don't think there is any "sensible limit" which is big enough for everyone's needs, but low enough you won't blow out memory.

An 8 billion digit number is 2.5G? (Did I do my maths right?) All I need to do is shove 1,000 of those in a JSON array, and I'll cause an out-of-memory anyway.

im3w1l2y ago

My 2cent anyway.

1 more reply

adolph2y ago

How does it handle exponent notation?

https://news.ycombinator.com/item?id=36027871

kccqzy2y ago

hinkley2y ago

So we had to go through all the code adding quotes to all the ID fields. That was a giant pain in my ass.

golergka2y ago

I've been burned by a similar issue too. Lesson here is never to use numbers for things you are not planning to do math on. Ids should always be strings.

justinpombrio2y ago

Isn't the lesson only that ids shouldn't be floats? If they were integers everything would be fine, but JS numbers aren't integers, even if they look like them sometimes.

1 more reply

greggyb2y ago

Until you want faster joins, in which case, comparisons of integers tend to be much faster on hardware I am aware of than string comparisons.

3 more replies

Recursing2y ago

> You have about a .1% chance of generating an ID that gets truncated in JavaScript

I don't follow. 1-(Number.MAX_SAFE_INTEGER / 2*63) ~ 99.9%, so don't you have a >99% chance of generating an ID that gets truncated in js?

js22y ago

IEEE 754 can represent integers larger than MAX_SAFE_INTEGER, just not all of them:

https://en.wikipedia.org/wiki/Double-precision_floating-poin...

That's still going to be a greater than 0.1% chance of hitting a non-representable value though.

1 more reply

secondcoming2y ago

Why would the value get truncated?

hatthew2y ago

FTA: JavaScript's built-in JSON implementation is limited to the range and precision of a double.

Obviously, not all int64 values are representable in float64 (double).

1 more reply

NegativeLatency2y ago

not all numbers are representable as the particular type of floating point number that js uses

nice pics here: https://en.wikipedia.org/wiki/Floating-point_arithmetic

yawaramin2y ago

Long story short: don't use JSON numbers to represent money or monetary rates. Always use decimals encoded as string. It's surprising how many APIs fall short of this basic bar.

magicalhippo2y ago

I must admit I totally forgot about the JSON number issue. Our files include fields for various monetary amounts and similar, and in XML we just used "xs:decimal".

Most will be less than a million and requires less than four decimal digits. But I guess someone might decide to use it for the equivalent of a space station or similar, ya never know...

incorrecthorse2y ago

No. Use integers to store the smallest money decimal, and store the currency name alongside.

sjducb2y ago

What happens if you’re sure that four decimal places is the smallest, then suddenly a partner system starts sending you 6 decimal places?

1 more reply

mrloba2y ago

yawaramin2y ago

1 more reply

jillesvangurp2y ago

wruza2y ago

kag02y ago

Yeah I disagree.

As the article said

> RFC 8259 raises the important point that ultimately implementations decide what a JSON number is.

oh, TIL that you can support large numbers with the default JavaScript JSON library https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

yawaramin2y ago

whalesalad2y ago

jerf2y ago

berkes2y ago

For money, that's a sane setup.

Point being: floats are dangerously naive for currency. But integers are naive too. You'll most probably want a "currency" or "money" type. Some Value Object, or even Domain Model.

berkes2y ago

When I say dangerously naive, I mean in a way that people can go to jail¹ for "loosing" or "inventing" cents. Which your software will do if you use floats.

¹IANAL. But this was told when legal people looked at our architecture.

1 more reply

hn_throwaway_992y ago

The problem with that (which I have seen in practice) is that you are essentially hard coding the maximum precision you will accept for every client that needs to interpret your JSON.

felixfbecker2y ago

What's worse is that these things can also change over time and there is sometimes disagreement over what the canonical value is.

For this reason I would always store currency amounts in an appropriate DECIMAL type in the DB and send currency amounts as strings over the wire.

NateEag2y ago

This is a good point.

It's not widely known, but US gasoline prices are actually in a defined currency unit, the mill (https://en.m.wikipedia.org/wiki/Mill_(currency)).

For most purposes, using mills as the base unit would be sufficient resolution.

nedt2y ago

So basically you use fixpoint numbers. Especially for currency that’s a very good idea anyway, because of rounding errors, even more so in IEEE 754

bterlsonOP2y ago

Supermancho2y ago

eddd-ddde2y ago

This is great as long as you always make clear which value is pre post encoding. I remember one of my first production bugs was giving users 100 times the credit they actually bought. Oops.

01HNNWZ0MV43FF2y ago

Makes sense for dollars, but for anything like graphics or physics I'd consider a power of two like 1,024 as the fixed-point factor instead.

My intuition tells me that "x * 1000 / 1000 == x" might not be true for all numbers if you're using floats.

beryilma2y ago

A sure sign of an inexperienced programmer in numerical computing is when they check for equality to zero of a floating-point number as

if (x == 0) ...

instead of something like

if (abs(x) < eps) ...

where eps is a suitably defined small number.

3 more replies

hobs2y ago

I often store it as smaller than cents, because anything with division or a basket of summed parts with taxes can start to get funky if you round down (and some places have laws about that.)

vbezhenar2y ago

EdSchouten2y ago

I think the description for Go is inaccurate/incomplete. You can call this function to instruct the decoder to leave numbers in unparsed string form:

https://pkg.go.dev/encoding/json#Decoder.UseNumber

That allows you to capture/forward numbers without any loss of precision.

bterlsonOP2y ago

I have added this note, thanks! In the blog I am mostly trying to show the behavior you get using the (maybe defacto) stdlib with its default configuration, but this is useful data to call out.

js22y ago

If you're going to extend Go the courtesy of customizing the parser, oughtn't you do the same for Python (and all the languages)?

To wit, Python's json module has `parse_float` and `parse_int` hooks:

https://docs.python.org/3/library/json.html#encoders-and-dec...

Example:

  >>> json.loads('{"int":12345,"float":123.45}', parse_int=str, parse_float=str)
  {'int': '12345', 'float': '123.45'}

1 more reply

bevekspldnw2y ago

Try to get a DECIMAL value out of a Postgres database into a JSON API response and you’ll learn all this and more in the most painful way possible!

Someone2y ago

Other values one could test for:

- “+1” (not a valid number, according to ECMA-404 and RFC-8259)

- “+0” (also not a valid number, but trickier than “+1” because IEEE floats have “+0” and “-0”)

- “070” (not a valid number, but may get parsed as octal 56)

- “1.” (not a valid number in json)

- “.1” (not a valid number in json)

- “0E-0” (a valid number in json)

There probably are others.

paulddraper2y ago

> I-JSON messages SHOULD NOT include numbers that express greater magnitude or precision than an IEEE 754 double precision number provides

I'm confused by this.

What is the precision of 0.1, relative to IEEE 754?

If I read it correctly, that statement is saying:

  json_number_precision(json_number) <= ieee_754_precision

^ How do I calculate these values?

bterlsonOP2y ago

ebolyen2y ago

Relatedly, what about integers like 9007199254740995. Is that a legal integer since it rounds to 9007199254740996?

hugh-avherald2y ago

I think the spec is saying that it is the message that should not express greater magnitude or precision, not 'the number'.

So including the string "0.1" in a message is fine because v = 0.1 implies 0.05 < v < 0.15, but including 0.100000000000000000000000000000000000 would not be.

Cloudef2y ago

speedgoose2y ago

If you need a specific exotic JSON parser to parse the numbers you have correctly, I would argue that you should serialise them as strings and not as numbers.

That's what prometheus is doing for example. https://prometheus.io/docs/prometheus/latest/querying/api/

Cloudef2y ago

That only works if you are the one who serializes the json in the first place.

nigeltao2y ago

For details, look for "rapidjson issue #1773" and "serde_json issue #707" at https://nigeltao.github.io/blog/2020/jsonptr.html

01HNNWZ0MV43FF2y ago

Oh wow. So serde_json doesn't roundtrip floats by default, it uses some imprecise faster algorithm https://github.com/serde-rs/json/issues/707

Good thing there's msgpack I guess.

lofenfew2y ago

int_19h2y ago

Dylan168072y ago

Rounding by more than an ULP is pretty bad. I don't think it's odd at all to describe rapidjson's behavior as improper.

At least 122.416294033786585 is between ...888 and ...889, though it's much closer to the former.

egwor2y ago

olejorgenb2y ago

Agreed, this can be a pain. Python by default serialize and de-serialize the `NaN` literal, making you pay some cleanup cost once you need to interopt with other systems. (same for `Inf`)

int_19h2y ago

1 more reply

eternityforest2y ago

Of course I'm not doing a lot of high performance algorithms, I would imagine in some applications you really do need to reason about floats.

j16sdiz2y ago

The font choice for inline text is so distracting

This "Averia Serif Libre" is unreadable for me.

ctrw2y ago

I still get a laugh of ecma 404. The first time I looked it up I refreshed the page a large number of times before I realized it wasn't an error.

p0w3n3d2y ago

Tabular-Iceberg2y ago

Does anyone have any idea why Crockford decided that at least one digit is required after the decimal point, as opposed to JavaScript which has zero or more?

erik_seaberg2y ago

It's weird that any parser that loses digits is tolerated. A parser that forces strings into uppercase US-ASCII never would be.

int_19h2y ago

It's tolerated because the JSON spec explicitly allows it:

   This specification allows implementations to set limits on the range
   and precision of numbers accepted.  Since software that implements
   IEEE 754 binary64 (double precision) numbers [IEEE754] is generally
   available and widely used, good interoperability can be achieved by
   implementations that expect no more precision or range than these
   provide, in the sense that implementations will approximate JSON
   numbers within the expected precision.  A JSON number such as 1E400
   or 3.141592653589793238462643383279 may indicate potential
   interoperability problems, since it suggests that the software that
   created it expects receiving software to have greater capabilities
   for numeric magnitude and precision than is widely available.

   Note that when such software is used, numbers that are integers and
   are in the range [-(2**53)+1, (2**53)-1] are interoperable in the
   sense that implementations will agree exactly on their numeric
   values.

msm_2y ago

That's true for every floating point number in every programming language you have ever used, though.

    $ python3
    Python 3.10.13 (main, Aug 24 2023, 12:59:26) [GCC 12.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 100000.000000000017
    100000.00000000001

Izkata2y ago

This is why Decimal exists:

  Python 3.8.10 (default, Nov 22 2023, 10:22:35) 
  [GCC 9.4.0] on linux
  Type "help", "copyright", "credits" or "license" for more information.
  >>> from decimal import Decimal
  >>> Decimal('100000.000000000017')
  Decimal('100000.000000000017')

For example:

  >>> import json
  >>> json.loads('{"a": 100000.000000000017}')
  {'a': 100000.00000000001}
  >>> json.loads('{"a": 100000.000000000017}', parse_float=Decimal)
  {'a': Decimal('100000.000000000017')}

3 more replies

enriquto2y ago

> That's true for every floating point number in every programming language you have ever used, though.

3 more replies

tpm2y ago

https://0.30000000000000004.com/

Although it would be good to move in the direction of using a BigDecimal equivalent by default when ingesting unknown data.

yau8edq12i2y ago

JSON is a notation. It's syntax. The semantics are left up to the implementation. The question has no answer.

datagreed2y ago

That font >.<

p0w3n3d2y ago

It does something to my eyes

frizlab2y ago

It’s missing Swift tests, but otherwise it’s a great post.

bterlsonOP2y ago

frizlab2y ago

I’m having a lot on my plate currently, but I’m adding this to my TODO list!

ape42y ago

Since JSON is so widely used it should be modified to support more types - Mongo DB's Extended JSON supports all the BSON (Binary) types:

    Array
    Binary
    Date
    Decimal128
    Document
    Double
    Int32
    Int64
    MaxKey
    MinKey
    ObjectId
    Regular Expression
    Timestamp

https://www.mongodb.com/docs/manual/reference/mongodb-extend...

bterlsonOP2y ago

1: https://github.com/tc39/proposal-json-parse-with-source

apantel2y ago

your_fin2y ago

It also specs proper decimal values, mitigating the issues presented in the OP.

https://amazon-ion.github.io/ion-docs/

Dylan168072y ago

JSON is not the place to be so fussy about number widths, and things like MaxKey and 24-hex-value ObjectId would be ridiculous.

zilti2y ago

Or maaaybe use XML for such cases

mulmen2y ago

There are no numbers in JSON. There are only strings.

billpg2y ago

"ID numbers start from 2^53 and are allocated sequentially including odd numbers that are not compatible with "double" types. Please ensure you are reading this value as a 64-bit integer."

timvdalen2y ago

A little off topic, but fun to see that someone else has adopted that magical CSS theme! (https://css.winterveil.net/)

1 more reply

j / k navigate · click thread line to collapse