The problem with hype around LLM is that people without much experience in the field can't think of anything else.
So much they forget the basics of the discipline.
What do you think cross validation is for?
To compare different weights obtained from different initializations, different topologies, different hyper-parameters... all trained from the same training dataset.
Even for LLM, have you ever tried to reduce the size of the vocabulary of, say, Llama?
No?
Yet it's a totally reasonable modification.
What's the preferred form to make modifications like this?
Can you do it fine tuning llama weights?
No.
You need training data.
That's why training data are the preferred form to make modification, because whatever the AI (hyped or not) it's the only form that let you make all modifications you want.