Just for your information: when calculating trig functions, you first modulo by 2 pi (this is called range reduction). Then you calculate the function, usually as a polynomial approximation, maybe piecewise.
But if it supports larger floats it must be doing range reduction which is impressive for low cycle ops. It must be done in hardware.
It doesn't surprise me regarding denorms. They're really nice numerically but always disabled when looking for performance!