1. You can use a Fourier transform to get the coefficients in O(n log n) time.
2. So, multiplying two approximations only takes O(n log n) time.
3. Also, adding, integrating, or taking the derivative only take O(n) time.
This is why chebfun/chebpy can run so fast while magically finding roots/derivatives/etc. A couple other interesting facts:
1. Remember the double-angle formula? There's a more general recursion for the Chebyshev polynomials:
\[ T_n(x) = 2x T_{n-1}(x) - T_{n-2}. \]
So, e.g.
\[ T_2(cos(theta)) = cos(2*theta) = 2cos(theta)^2 - 1 = 2cos(theta)T_1(cos(theta)) - T_0(cos(theta)) \]
2. Computers actually use this recursion to calculate sines and cosines! So, it's a little inefficient to code your Chebyshev polynomials using `math.sin`.
3. Using generating functions, you can get a closed form for T_n(x) that only takes O(log n) time to calculate. (Note: assuming you count multiplications as constant. However, you actually need O(log n) bits to accurately represent x, so it's more accurately O((log n)^2 log log n).)
[1]: https://en.wikipedia.org/wiki/Remez_algorithm
[2] : https://www.sollya.org/
https://people.maths.ox.ac.uk/trefethen/ATAP/
Also chebfun!
I wrote that code (to compute sqrt 2) in 1974 or 1975 in IBM 360 Assembler. I used a conditional macro constant that increased the number of iterations of Newton's from 2 to 3 just in case the client wanted double precision.
as i understand it, other commonly-used strategies include spline interpolation (using second-, third-, or even fourth-order interpolation, requiring respectively three, four, and five multiplications, which can be done concurrently) and, in suitable cases like this example, newton iteration from an initial table-lookup guess
unlike the piecewise-taylor approach outlined early in the article, spline interpolation only requires storing a tiny amount more data than simple nearest-neighbor table lookup (potentially three more points for fourth-order interpolation, so a 256-entry table becomes 259 entries)
on a different topic, i think it's easy to find embedded dsp applications where the easiest solution uses fourier transforms, which usually do require high-precision floating point. machine vision, radio communication, musical applications, etc.
incidentally, if you find yourself in a situation where you actually need the taylor expansion of √(1+x) or √(½+x) or something, and you don't want to do a bunch of pencil-and-paper algebra (or don't trust yourself), pari/gp has your back:
? sqrt(1+x) + O(x^5)
%5 = 1 + 1/2*x - 1/8*x^2 + 1/16*x^3 - 5/128*x^4 + O(x^5)
? sqrt(1+x) + O(x^7)
%7 = 1 + 1/2*x - 1/8*x^2 + 1/16*x^3 - 5/128*x^4 + 7/256*x^5 - 21/1024*x^6 + O(x^7)
? sqrt(1/2+x) + O(x^5)
%6 = 0.70710678118654752440084436210484903928 + 0.70710678118654752440084436210484903928*x - 0.35355339059327376220042218105242451964*x^2 + 0.35355339059327376220042218105242451964*x^3 - 0.44194173824159220275052772631553064955*x^4 + O(x^5)I use Chebyshev polynomials extensively in finance and have tried problems like MNIST with Chebyshev and they get close to CNNs in accuracy.
ApproxFun Julia package pretty cool for Chebyshev work:
What is your architecture? Is it just a fully connected layer of chebyshev?
Chebyshev Approximation - https://news.ycombinator.com/item?id=10115336 - Aug 2015 (60 comments)
type Result is range -100 .. 100 delta 0.0001;
and the compiler will figure out how to give you fast math with only the accuracy and resolution that you need!
my experience outside of ada is that decimal arithmetic is never fast on binary computers