While most GPUs support FP64, unless you pay for the really high-end scientific computing models, you're typically getting 1/32nd rate compared to FP32 performance. Even your shiny new RTX 4090 runs FP64 at 1/64th rate.
2xFP32 for most basic operations can be 1/4th the rate of FP32. It is quite often the superior solution compared to using the FP64 support provided in GPU languages.
I wonder if there is a hardware reason for this or It's just market segmenting by nvidia.
I can be wrong about who did that first, but most FPUs now are done like that.
rot13 to avoid spoilers for people who haven't played the game: Gur fha tbrf abin va gjragl-gjb zvahgrf, fb guvf vfa'g npghnyyl na vffhr va cenpgvpr.
EVE Online had (still has?) a similar issue with its camera being able to zoom in on objects that are very far away. Normally, at those distances, you'd be using your overview or the HUD markers, but if you did zoom in on a far object, the origin would still be on your ship (or maybe the center of the area you were in), and the object would get distorted. Especially fun when it was a floating corpse.
Perhaps such a brilliant idea came to them in a dream. But maybe they forgot how they did it in another dream.
Essentially you translate the world back to origin when the player gets too far away.
Something like Kerbal space program is an example where I'd probably break out the doubles.
For final projection, clipping, z buffering, etc., single precision is almost certainly enough.
If you do this in the GPU then you would need it to handle double precision data.
> Overall, we are quite happy with how this solution turned out. We think it is the closest to "just works" that we can get.
I think this is the crux of it. The performance penalty is very small and the convenience factor is very high.
But, now I want to know what they do with the positions of lights in the scene... Likely transformed to view space regardless for deferred rendering, I'd guess.
The reason is that it is not enough to extend the precision of the 32-bit FP numbers. The exponent range must also be extended. The standard double-precision numbers have an exponent range that is large enough to make underflow and overflow very unlikely in most algorithms. With the very small exponent range of FP32 numbers, underflow and overflow is very likely and this must be corrected in any double precision implementation.
So it is not enough to use two FP32 numbers to represent one FP64 number. One must use either a third number for the exponent, or at least one of the two 32-bit numbers must be integer and partitioned into exponent and significand parts.
Both approaches will lead to much more complex algorithms and a much worse speed ratio for FP64 implemented with FP32 vs. FP128 implemented with FP64.
In deep learning, this is huge! If you have numbers this big, then something is definitely already wrong. If you have numbers that small, then you definitely don't care.
I wonder if deep learning will save us from poorly conditioned linear algebra too.
Some of these cannot be represented in single precision, while for the others one or two multiplications or divisions are enough to cause underflows and overflows. Such wide ranges are unavoidable in complex physical simulations, because their origin is in the ratios between quantities at human or astronomic sizes and quantities at atomic or molecular sizes.
Single precision values are perfectly adequate to represent the input data and the final results of any computation, because 24 bit is about the limit for any analog-to-digital or digital-to-analog conversion, and the exponent range is also sufficient for the physical quantities that can be measured directly, but when you simulate any semiconductor device and even when you simulate just an electrical circuit with discrete components, it is very frequent to have intermediate results with values much outside the range that can be represented in single precision, even up to 10^60 or 10^-60. When computing a high-order polynomial in order to approximate the solution of some problem, some intermediate values may be even outside that range.
In theory it is possible to avoid underflows and overflows by introducing a large number of scale factors in the equations, in appropriate places.
However, handling those scale factors in a program is extremely tedious and error prone. The floating point format was invented precisely in order to avoid the need of dealing with scale factors. If someone introduces scale factors in a program, they might as well use fixed-point numbers, because the main advantage of the floating-point numbers is lost.
If you want to move in the environment, you want to be able to store the relative positions of a teacup and a table in virtual London with the same precision as in virtual New York. So the coordinates of objects should be stored as integers. Then to render the world, the camera coordinate (also an integer) is subtracted from all objects, with no loss of precision, and the result cast to float for 3D rendering.
Back to Godot, I thought the answer would be to precompute the ModelView matrix on the CPU. Object -> World -> Camera is a “large” transformation. But the final Object -> Camera transform is “small”. I’m sure there’s a reason this doesn’t work, but I forget it.
Unreal 5 changes to doubles everywhere for large world coordinates. I wonder what fun issues they had to solve?
That's the core idea here. A bit more detail would help. Is that done in the GPU? Is that extra work for every vertex? Does it slow down rendering because the GPU's 4x4 matrix multiplication hardware can't do it?
I actually have to implement this soon in something I'm doing. So I really want to know.
This is overkill for what I'm doing. They want to zoom way out and see planet-sized objects. I just have a big flat world a few hundred km across. So the usual "offset the render origin" approach will work. I don't have to update on every frame, only when the viewpoint moves a few hundred meters.
Is there a “lossy compression” benefit to describing space with floats?
Also a granularity of 1mm will make slow movement complicated to calculate correctly. Consider updating at 60Hz and having an object that moves 1 inch per second.
I was taught that MV/MVP should be calculated CPU-side per-model, and that doing it in the vertex shader is wasteful. Is that advice out of date?
Really half-floats are more interesting, saving 50% memory on the GPU for mesh data. You could imagine using half-floats for animations too!
Then we could have the debate about fixed point vs. floating. Why we choose to use a precision that deteriorates with distance is descriptive of our short sightedness in other domains like the economy f.ex. (lets just print money now close to origin and we'll deal with precision problems later, when time moves away from origin)
What you want is fixed point, preferably with integer math so you get deterministic behaviour, even across hardware. Just like float/int arrays both give you CPU-cache and atomic parallelism at the same time, often simplicity is the solution!
In general 64-bit is not interesting at all, so the idea Acorn had with ARM that jumping to 32-bit forever is pretty much proven by now. Even if addressing only jumped to from 26-bit to 32-bit with ARM6.
Which leads me to the next interesting tidbit, when talking 8-bit the C64 had 16-bit addressing.
But really all large worlds need chunking.
The real reason AAA never got into user generated content, is they have staff to create linear worlds.
After this economic crisis, linear content will more or less disappear.
Why listen to a hardcoded story when you can make your own just like in real life?
Scarcity is the key, one UGC networked world will make time valuable.