Intel's optimization manual has suggestions for fast versions at 15.12.3 (recommended by @James https://stackoverflow.com/a/2637823/10981777) Don't look especially long to implement for 22-bit approximation.
https://software.intel.com/content/www/us/en/develop/downloa...
Also, what's the SQRD? Not finding that referred to anywhere.