I get that. I'm saying the medium.en model specifically seems to have some weird edges to its behavior that is not present in the models up or down the scale from it, or similarly (the plain 'medium' model).
It's the only one that seems to be occasionally spitting out significant chunks of training data versus something that resembles the audio.