I learned relatively recently that trig functions on the GPU are free if you don’t use too many of them; there’s a separate hardware pipe so they can execute in parallel with floats adds and muls. There’s still extra latency, but it’ll hide if there’s enough other stuff in the vicinity.
Yep, these intrinsics are what I was referring to, and yes the software versions won’t use the hardware trig unit, they’ll be written using an approximating spline and/or Newton’s method, I would assume, probably mostly using adds and multiplies. Note the loss of precision with these fast-math intrinsics isn’t very much, it’s usually like 1 or 2 bits at most.
Yes, but here it's about avoiding multiplication (and division).
I suspect on a modern processor the branches (ie "if"s) in Bresenham's algorithm are gonna be more expensive than the multiplications and divisions in the naive algorithm.
Bresenham is easy to implement branchlessly using conditional moves or arithmetic. It also produces a regular pattern of branches that is favorable for modern branch predictors that can do pattern prediction.