Claude code uses primarily different "pathways" in Anthropic LLMs that were not post-trained with RLHF, but rather with RLVF (reinforcement learning with verifiable rewards).
So, his point about code being produced to please the user isn't valid from where I am sitting.
No comments yet.