> But reward models are always curated by humans.
There is no inherent reason why they need to be.
> So yeah theoretically you could generate reward models with LLMs, but they won't be any good, unless they are curated by other reward models that are ultimately curated by humans.
This reasoning is begging the question: The reasoning is true only if the conclusion is true. It's therefore a logically invalid argument.
There is no inherent reason why this needs to be the case.