Why does having comparable performance indicate having been trained on a preexisting model's output?
I read a similar claim in relation to another model in the past, so I'm just curious how this works technically.