> Training a one bit neural network from scratch is apparently an unsolved problem though.
It was until recently, but there is a new method which trains them directly without any floating point math, using "Boolean variation" instead of Newton/Leibniz differentiation:
Unfortunately the paper seems to have been mostly overlooked. It has only a few citations. I think one practical issue is that that existing training hardware is optimized for floating point operations.