Normally, ML training is via back propagation, which is a synchronous technique. If you try to trivially parallelize, it doesn't work, for reasons (tm).
This lets you train a machine learning model of arbitrary size (bigger than can fit on a GPU, or even a multigpu node) using an actor-based distributed technique. There is a slight training cycles count penalty but it's way less than the cost of coordination.