If you're interested, you can peruse the C code that was used to generate the tables. Here's the file for sine/cosine:
https://github.com/RandalLinden/DOOM-FX/blob/master/source/m...
Pentium was fast enough that it didn't matter as much.
Just a few years later it was slower to read a trig precomputed table.
The approach is older than that. I remember my grandfather's engineering books from 1950's – nearly each of them had a large addendum with the said pre-calculated sine, cosine, tangent and logarithm lookup tables. And there was at least one book that only had such tables and no other information.
That is how engineers used to calculate before the advent of computers.
It is interesting that you can get very decent results even with low quality tables. Of course there will be artifacts but due to the randomness of a path tracer this is not always very noticeable.
Is this really the order of events? I imagine the pre-calculated route is what you'd try first, and only go for extra hardware if that failed somehow.
E.g. I would assume that Math.sin(x) returns the same thing in NodeJS on Windows and Mac/M1, but it turns out it is necessarily so. https://stackoverflow.com/questions/74074312/standard-math-f...
https://en.wikipedia.org/wiki/Rounding#Table-maker's_dilemma
True in general but for the basic datatypes sent through hardware regusters your processor architecture has fixed precision. So the time and space complexity is O(1)
We implemented a deterministic approximation, and moved on. But I learned something important about trig functions that day.
So while what you say is true about the x87 implementation of those functions, for anything targeting a machine built in the last 20 years it's likely the code will run consistently regardless the architecture (barring architecture floating point bugs, which aren't terribly uncommon in the less significant bits and when overclocking comes into play).
x86 compilers won't use x87 instructions when SSE2 and later are available. x87 is just a really weird and funky instruction set that's best left in the gutter of history.
Unfortunately, floating point results will probably continue to differ across platforms for the foreseeable future.
hmmm, can you use the long doubles in sse or avx? They are glorious, and as far as I see from playing with godbolt, they still require dirtying your hands with the x87 stack.
I would use the word "consistent."
Non-determinism implies randomness.
Plan 9: https://9p.io/sources/plan9/sys/src/libc/port/sin.c
Freebsd: https://cgit.freebsd.org/src/tree/lib/msun/src/k_sin.c
and the directory name is literally: `msun`
"Finding the BEST sine function for Nintendo 64"
The degree-9 polynomial, said to be a thousand times better than the original Taylor approximation in maximum error, also appears to be very close to the Taylor series in the first place.
Rounding the Taylor coefficients to 6 digits after the decimal:
1/3! = 0.166667
1/5! = 0.008333
1/7! = 0.000198
1/9! = 0.000027(56)
The first 2 are exact, the third is 5 digits only (so 0.000190), and the fourth is more different starting from the 6th digit (0.000026019).
The delta in the 9-th order is expected if you were to truncate the Taylor series starting from the 11th order to infinity (+ x^11 / 11! - x^13/13! ...).
https://www.wolframalpha.com/input?i=plot+sin%28x%29+-+P%28x....
You'll note that the error is small near zero, and large near pi/4, which is characteristic of a Taylor series around zero, and not the characteristic "level" oscillation characteristic of Remez approximations[1]. Note also that the polynomial only includes odd terms, which is not something I would expect from Remez unless it was run on the symmetric interval [-pi/4, pi/4].
I ran Remez for the problem and after 4 iterations obtained a degree 8 polynomial with error less than 1e-9, but it didn't look anything like the polynomial given in the article.
1.7209863008943345e-05x^8 -0.00024575124459624625x^7 +7.194649190849227e-05x^6 +0.008268794893899754x^5 +3.425379759410762e-05x^4 -0.16667692317020713x^3 +1.5422400957890642e-06x^2 +0.9999999106213322x +8.526477071446453e-10
Although of course the first few digits of the low-order matching terms will be very similar - any polynomial approximation method will agree there because they are fitting the same function, after all. But by the time we reach x^5 or x^7 the agreement is very loose, really only the first couple digits are the same.Yes I too was expecting oscillations (since I saw Chebyshev polynomials mentioned on the wiki page). I'm beginning to think that the word "might" in their statement "One way to calculate is Remez’s algorithm. The results might look like this:" is doing a lot of heavy lifting.
A couple of notes about your results:
- The interval only needs to be [0, pi/16] and not [0, pi/4]; adjusting that in your wolframalpha query yields a similar bound to what the author lists: 3.5 e-9. (Although same observation about the change of behavior at the bounds as you made).
- The polynomial you obtained actually looks pretty close to the Taylor one: the even coefficients are small, and rounding to 5 or 6 digits might yield a close one, maybe if we look for a degree 9 polynomial instead of degree 8. (I don't have an implementation of Remez handy right now to try it myself).
> Although of course the first few digits of the low-order matching terms will be very similar
Makes sense.
It’s a very simple iterative algorithm, essentially the dumbest thing that could possibly work (like most good algorithms). It fails to converge for functions that have poles nearby unless you have a very good initial guess (the Chebyshev or Carathéodory-Fejér approximants are ~always good starting points and easily computed). In practice you want to optimize a weighted L-inf norm rather than absolute, because floating-point errors are measured in a relative norm.
https://basesandframes.files.wordpress.com/2016/05/rgreenfas...
-(-(sin(x))or for style points just -cos'
If you use cos(x) = 1 then you are still accurate to within 1%
[edit]
I think I also found an error in TFA? It seems like picking the "best" N would allow r to be in the range -pi/32 < r < pi/32, which makes the 3rd order Taylor series have an error of 4e-12, significantly better than the error range for 0 < r < pi/16
(Also has a nice property, apparently, that CORDIC-like routines exist for a bunch of special functions and they're very similar to each other. Does anyone have a good resource for learning the details of those algorithms? They sound elegant).
There are a lot of references given in the book including more details on CORDIC.
That said, last time I had to do that in software, I used Taylor series. Might not have been an optimal solution.
EDIT:
AMD's Zen 4 takes 50-200 cycles (latency) to compute sine. I think that strongly suggests AMD uses CORDIC. https://www.agner.org/optimize/instruction_tables.pdf page 130.
Same for Intel, Tiger Lake (Intel gen 11) has 60-120 cycles of latency. Page 353.
I'd guess usually ~50 cycles for Zen 4 (and ~60 for Intel) for float32, float64/float80 datatype. Denormals might also cost more cycles.
Couldn't wrap my head around the concept until the topic was revisited in college, but idea was there and helped me understand the tradeoffs brought by "fast math" libraries I was using during high school.
They use a lookup table for nπ/16, and then a polynomial to approximate the difference to other values.
And that's only float32!
I’m brushing up my math basics (I graduated CS while dodging the math requirements) and it frustrates me that in trig I need to remember values at all. The values such as sqrt(2)/2 make sense but how hard is it to calculate sin(5 degrees) by hand?
0 is a decent approximation of sin near 0.
x is a better one.
x - x^3/6 is an even better one.
x - x^3/6 + x^5/120 ...
Note that x here is in radians rather than degrees so convert (degrees * pi/180) first. Repeat until you're satisfied with how many stable digits you get
I wonder what the first program was that ever used a precomputed trig table.
Any calculating job can be undertaken by a proper Turing machine. You just have to keep in mind the old triangle of usage: Cost, capability and speed.
If a human can calculate a sine, so can any full-blown computer.
For comparison:
- Apple’s M2 Ultra has about 134 billion transistors. That’s about 2^38.
- Avogadro’s number is about 2^79.
Reducing the argument to a small range around zero decreases that a lot, but not enough by a far stretch. There are 2^52 doubles in [0.5, 1.0), 2^52 more in [0.25, 0.5], 2^52 more in [0.125, 0.25], etc. so you’d still easily need 2^52 entries or 2^58 bits (likely way, way more)
https://en.m.wikipedia.org/wiki/Pentium_FDIV_bug
“It is implemented using a programmable logic array with 2,048 cells, of which 1,066 cells should have been populated with one of five values: −2, −1, 0, +1, +2.”
Not sure what you’re trying to demonstrate, they wouldn’t store every single float!! I hope don’t program. ;)
Given the ridiculous number of transistors and so on we can use in CPUs and GPUs nowadays how feasible is a relatively huge trig lookup table burned into rom?
Such a table for double-precision would be much, much larger.
If you don't believe me try for x=10.
You wouldn't evaluate the terms naively from left-to-right.
x - x^3/3! + x^5/5! - ... = x * (1 - x^2/(2*3) * (1 - x^2/(4*5) * ... ) )
I just checked in python and you get a result that is around 1000*machine epsilon off. Not great, not terrible.