Is multi token prediction the same as predicting the embedding of a complex token (the articulation of those input tokens in a sentence)?