Training is currently done in floating point math, whereas inference can be done fixed point without much loss of performance. Fixed point is ~10x cheaper in terms of power and silicon area for equal performance.
Also, training requires a lot more RAM per unit of compute, since it needs to store all past layer activations, whereas for inference, that is unnecessary.
As far as I know, no player who has developed dedicated ML hardware (as opposed to using GPU's) uses the same hardware for both inference and training.