This article [0] from Nvidia gives a good overview of how mixed precision training works.
Super high level (from section 3):
1. Converting the model to use the float16 data type where possible.
2. Keeping float32 master weights to accumulate per-iteration weight updates.
3. Using loss scaling to preserve small gradient values.
[0]
https://docs.nvidia.com/deeplearning/performance/mixed-preci...