The documentation is quite paranoid that if you are dealing with untrusted inputs, you could parse two JSON numbers from the untrusted source fine and then performing an addition on them could cause your memory to fill up. Exciting new DoS vector.
Of course in practice people end up parsing them into custom types with 64-bit integers, so this is only a problem if you are manipulating JSON directly which is very rare in Haskell.
Sounds like Haskell made the right call: put warnings in the docs and steer the user in the right direction. Keeps implementation simple and users in control.
To the point of the article, serde_json support is improving in the next version of BigDecimal, so you'll be able to decorate your BigDecimal fields and it'll parse numeric fields from the JSON source, rather than json -> f64 -> BigDecimal.
#[derive(Serialize, Deserialize)]
pub struct MyStruct {
#[serde(with = "bigdecimal::serde::json_num")]
value: BigDecimal,
}
Whether or not this is a good idea is debatable[^], but it's certainly something people have been asking for.[^] Is every part of your system, or your users' systems, going to parse with full precision?
Serde has an interface that allows failing. That one should fail. There is also another that panics, and AFAIK it will automatically panic on any parser that fails.
Do not try to handle huge values, do not pretend your parser is total, and do not pretend it's a correct value.
If you want to create an specialized parser that handles huge numbers, that's great. But any general one must fail on them.
8 billion digits (~100 bits?) is far more than should be used.
Would it possible to use const generics to expose a `BigDecimal<N>` or `BigDecimal<MinExp, MaxExp, Precision>` type with bounded precision for serde, and disallow this unsafe `BigDecimal` entirely?
If not, I expect BigDecimal will be flagged in a CVE in the near future for causing a denial of service.
An 8 billion digit number is 2.5G? (Did I do my maths right?) All I need to do is shove 1,000 of those in a JSON array, and I'll cause an out-of-memory anyway.
On the other hand, any limit low enough that I can't blow up memory by making an array of 100K or so is going to be too low for some people (including me, I often make numbers of low-million numbers of digits).
Providing some method of putting a limit on seems sensible, but maybe just make a LimitedBigDecimal type, so then through the whole program there is a limit on how much memory BigDecimals can take up? (I haven't looked at the library in detail, sorry).
My 2cent anyway.
So we had to go through all the code adding quotes to all the ID fields. That was a giant pain in my ass.
I don't follow. 1-(Number.MAX_SAFE_INTEGER / 2*63) ~ 99.9%, so don't you have a >99% chance of generating an ID that gets truncated in js?
https://en.wikipedia.org/wiki/Double-precision_floating-poin...
That's still going to be a greater than 0.1% chance of hitting a non-representable value though.
Obviously, not all int64 values are representable in float64 (double).
nice pics here: https://en.wikipedia.org/wiki/Floating-point_arithmetic
I must admit I totally forgot about the JSON number issue. Our files include fields for various monetary amounts and similar, and in XML we just used "xs:decimal".
Most will be less than a million and requires less than four decimal digits. But I guess someone might decide to use it for the equivalent of a space station or similar, ya never know...
Encoding numbers as string because you are using a language and parser that can't deal with numbers properly (even 64 bit doubles), is a bit of a hack. Basically the rest of the world giving up because Javascript can't get its shit together is not a great plan.
As the article said
> RFC 8259 raises the important point that ultimately implementations decide what a JSON number is.
Any implementation dealing with money or monetary rates should know that it needs to deal with precision and act accordingly. If you want to use JavaScript to work with money, you need to get a library that allows you to represent high precision numbers. It's not unreasonable to also expect that you get a JSON parsing library that supports the same.
oh, TIL that you can support large numbers with the default JavaScript JSON library https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
But do note, that in currency, there are multiple, actively used currencies that have zero, three, five (rare) or even eight (BTC) decimals. That some decimals cannot be divided by all numbers (e.g. only 0.5)
Point being: floats are dangerously naive for currency. But integers are naive too. You'll most probably want a "currency" or "money" type. Some Value Object, or even Domain Model.
XML offered all this, but in JSON there's little to convey this, other than some nested "object" with at least the decimal amt (as int), and the ISO4217 currency. And maybe -depending on how HATEOAS you wanna be- a formatting string to be used in locales, a rule on divisibility and/or how many decimal places your int or decimal might be.
(FWIW, I built backbends for financial systems and apps. It gets worse than this if you do math on the currencies. Some legislatioins or bookkeeping rules state that calculation uses more or less decimals. E.g. that ($10/3)*3 == $10 vs == $9.99. or that $0.03/2 == 0.1 + 0.2, e.g. when splitting a bill. This stuff is complex, but real domain logic)
¹IANAL. But this was told when legal people looked at our architecture.
For example, you say you store monetary amounts as cents. What if you needed to store US gas prices, which are normally priced in amounts ending in 9/10ths of a cent? If you want to keep your values as integers you need to change your precision, which will likely mess up a lot of your code.
What's worse is that these things can also change over time and there is sometimes disagreement over what the canonical value is.
E.g. ISO 4217 (used by Safari, Firefox and NodeJS) will say that the Indonesian Rupiah (IDR) uses 2 decimal digits, while Unicode CLDR (used by Chrome) will say that they use 0 decimal digits. The former is the more "legalistic" definition, while the latter matches how people use the currency in reality.
This is not a real issue if you transfer amounts as decimal strings and then pass those to the Intl API for formatting (the formatting will just be different but still correct), but it's catastrophic if you use scaled-up integers (all amounts will be off by magnitudes).
For this reason I would always store currency amounts in an appropriate DECIMAL type in the DB and send currency amounts as strings over the wire.
It's not widely known, but US gasoline prices are actually in a defined currency unit, the mill (https://en.m.wikipedia.org/wiki/Mill_(currency)).
For most purposes, using mills as the base unit would be sufficient resolution.
My intuition tells me that "x * 1000 / 1000 == x" might not be true for all numbers if you're using floats.
if (x == 0) ...
instead of something like
if (abs(x) < eps) ...
where eps is a suitably defined small number.
https://pkg.go.dev/encoding/json#Decoder.UseNumber
That allows you to capture/forward numbers without any loss of precision.
To wit, Python's json module has `parse_float` and `parse_int` hooks:
https://docs.python.org/3/library/json.html#encoders-and-dec...
Example:
>>> json.loads('{"int":12345,"float":123.45}', parse_int=str, parse_float=str)
{'int': '12345', 'float': '123.45'}
FWIW, when I've cared about interop and controlled the schema, I've specified JSON strings for numbers, along with the range, precision, and representation. This is no worse (nor better) than using RFC 3339 for dates.- “+1” (not a valid number, according to ECMA-404 and RFC-8259)
- “+0” (also not a valid number, but trickier than “+1” because IEEE floats have “+0” and “-0”)
- “070” (not a valid number, but may get parsed as octal 56)
- “1.” (not a valid number in json)
- “.1” (not a valid number in json)
- “0E-0” (a valid number in json)
There probably are others.
I'm confused by this.
What is the precision of 0.1, relative to IEEE 754?
If I read it correctly, that statement is saying:
json_number_precision(json_number) <= ieee_754_precision
^ How do I calculate these values?It does seem unclear what it means to exceed precision (given rounding is such an expected part of the way we use these numbers). Magnitude feels easier as at least you definitely run out of bits in the exponent.
So including the string "0.1" in a message is fine because v = 0.1 implies 0.05 < v < 0.15, but including 0.100000000000000000000000000000000000 would not be.
That's what prometheus is doing for example. https://prometheus.io/docs/prometheus/latest/querying/api/
I mean that rapidjson (C++) parsed the string "0.99999999999999999" as the number 1.0000000000000003. Apart from just looking weird, it's a different float64 bit-pattern: 0x3FF0000000000000 vs 0x3FF0000000000001.
Similarly, serde-json (Rust) parsed "122.416294033786585" as 122.4162940337866. This isn't as obvious a difference, but the bit-patterns differ by one: 0x405E9AA48FBB2888 vs 0x405E9AA48FBB2889. Serde-json does have an "float_roundtrip" feature flag, but it's opt-in, not enabled by default.
For details, look for "rapidjson issue #1773" and "serde_json issue #707" at https://nigeltao.github.io/blog/2020/jsonptr.html
Good thing there's msgpack I guess.
The only sane thing with JSON is to avoid numbers altogether and just use decimal-encoded strings. This forces the person parsing it on the other end to at least look up the actual limits defined by your schema.
At least 122.416294033786585 is between ...888 and ...889, though it's much closer to the former.
Say what you want about NaN, but IEEE 754 is the facto way of dealing with floating points in computers and even if NaNs and Infs are a bit "fringe" it's unfortunate that the most popular serialization format can not represent these.
They're not really noisy, but if an application would work with some random noise added, it will probably work with floats, and if it wouldn't work with noise added, it's probably easier to just not use floats and expext people to reason about IEEE details, while risking subtle bugs if different float representations get mixed.
Of course I'm not doing a lot of high performance algorithms, I would imagine in some applications you really do need to reason about floats.
This "Averia Serif Libre" is unreadable for me.
This specification allows implementations to set limits on the range
and precision of numbers accepted. Since software that implements
IEEE 754 binary64 (double precision) numbers [IEEE754] is generally
available and widely used, good interoperability can be achieved by
implementations that expect no more precision or range than these
provide, in the sense that implementations will approximate JSON
numbers within the expected precision. A JSON number such as 1E400
or 3.141592653589793238462643383279 may indicate potential
interoperability problems, since it suggests that the software that
created it expects receiving software to have greater capabilities
for numeric magnitude and precision than is widely available.
Note that when such software is used, numbers that are integers and
are in the range [-(2**53)+1, (2**53)-1] are interoperable in the
sense that implementations will agree exactly on their numeric
values.
And yes, this is completely insane for a format that supposed to be specifically for serialization and interop. Needless to say, the industry has enthusiastically adopted it to the point where it became the standard.I miss XML these days. Sure, it was verbose and had a bunch of different and probably excessive numeric types defined for XML Schema... but at least they were well-defined (https://www.w3.org/TR/xmlschema-2/#built-in-datatypes). And, on the other hand, without a schema, all you had were strings. Either way, no mismatched expectations.
$ python3
Python 3.10.13 (main, Aug 24 2023, 12:59:26) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 100000.000000000017
100000.00000000001 Python 3.8.10 (default, Nov 22 2023, 10:22:35)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from decimal import Decimal
>>> Decimal('100000.000000000017')
Decimal('100000.000000000017')
For example: >>> import json
>>> json.loads('{"a": 100000.000000000017}')
{'a': 100000.00000000001}
>>> json.loads('{"a": 100000.000000000017}', parse_float=Decimal)
{'a': Decimal('100000.000000000017')}Alright, if "you" have only ever used python. In C, for example, we have hexadecimal floating point literals that represent all floats and doubles exactly (including infinities and nans that make the json parser fail miserably).
Although it would be good to move in the direction of using a BigDecimal equivalent by default when ingesting unknown data.
Array
Binary
Date
Decimal128
Document
Double
Int32
Int64
MaxKey
MinKey
ObjectId
Regular Expression
Timestamp
https://www.mongodb.com/docs/manual/reference/mongodb-extend...It also specs proper decimal values, mitigating the issues presented in the OP.