If 3 years into LLMs even HNers still don't understand that the response they give to this kind of question is completely meaningless, the average person really doesn't stand a chance.
It’s just a text generator that generates plausible text for this role play. But the chat paradigm is pretty useful in helping the human. It’s like chat is a natural I/O interface for us.
Think of it as three people in a room. One (the director), says: you, with the red shirt, you are now a plane copilot. You, with the blue shirt, you are now the captain. You are about to take off from New York to Honolulu. Action.
Red: Fuel checked, captain. Want me to start the engines?
Blue: yes please, let’s follow the procedure. Engines at 80%.
Red: I’m executing: raise the levers to 80%
Director: levers raised.
Red: I’m executing: read engine stats meters.
Director: Stats read engine ok, thrust ok, accelerating to V0.
Now pretend the director, when heard “I’m executing: raise the levers to 80%”, instead of roleplaying, she actually issue a command to raise the engine levers of a plane to 80%. When she hears “I’m executing: read engine stats”, she actually get data from the plane and provide to the actor.
See how text generation for a role play can actually be used to act on the world?
In this mind experiment, the human is the blue shirt, Opus 4-6 is the red and Claude code is the director.
Even persuade is too strong a word. These things dont have the motivation needed to enable persuation being a thing. Whay your client did was put one data point in the context that it will use to generate the next tokens from. If that one data point doesnt shift the context enough to make it produce an output that corresponds to that daya point, then it wont. Thats it, no sentience involved
Often enough, that text is extremely plausible.
Which sure, can be helpful, but it’s kinda just a coincidence (plus some RLHF probably) that question happens to generate output text that can be used as a better prompt. There’s no actual introspection or awareness of its internal state or architecture beyond whatever high level summary Anthropic gives it in its “soul” document et al.
But given how often I’ve read that advice on here and Reddit, it’s not hard to imagine how someone could form an impression that Claude has some kind of visibility into its own thinking or precise engineering. Instead of just being as much of a black box to itself as it is to us.
For this single problem, open a new claude session with this particular issue and refining until fixed, then incorporating it into the larger project.
I think the QA agent might have been the same step here, but it depends on how that QA agent was setup.
This is way too strong isn't it? If the user naively assumes Claude is introspecting and will surely be right, then yeah, they're making a mistake. But Claude could get this right, for the same reasons it gets lots of (non-introspective) things right.
They also said it "admitted" this as a major problem, as if it has been compelled to tell an uncomfortable truth.
In this specific case I'd go one step further and say that even if it did a web search, it's still almost certainly useless because of the low quality of the results and their outdatedness, two things LLMs are bad at discerning. From weights it doesn't know how quickly this kind of thing becomes outdated, and out of the box it doesn't know how to account for reliability.