You could just ask it? Or you don’t trust the AI to answer you honestly?
LLMs can't lie nor can they tell the truth. These concepts just don't apply to them.
They also cannot tell you what they were "thinking" when they wrote a piece of code. If you "ask" them what they were thinking, you just get a plausible response, not the "intention" that may or may not have existed in some abstract form in some layer when the system selected tokens*. That information is gone at that point and the LLM has no means to turn that information into something a human could understand anyways. They simply do not have what in a human might be called metacognition. For now. There's lots of ongoing experimental research in this direction though.
Chances are that when you ask an LLM about their output, you'll get the response of either someone who now recognized an issue with their work, or the likeness of someone who believes they did great work and is now defending it. Obviously this is based on the work itself being fed back through the context window, which will inform the response, and thus it may not be entirely useless, but... this is all very far removed from what a conscious being might explain about their thoughts.
The closest you can currently get to this is reading the "reasoning" tokens, though even those are just some selected system output that is then fed back to inform later output. There's nothing stopping the system from "reasoning" that it should say A, but then outputting B. Example: https://i.imgur.com/e8PX84Z.png
* One might say that the LLM itself always considers every possible token and assigns weights to them, so there wouldn't even be a single chain of thought in the first place. More like... every possible "thought" at the same time at varying intensities.
It sounds like you either have access to bad models or you are just imagining what it’s like to use an LLM in this way and haven’t actually tried asking it why it wrote something. The only judgement you need to make is the explanation makes sense or not, not some technical or theoretical argument about where the tokens in the explanation come from. You just ask questions until you can easily verify things for yourself.
Also, pretending that the LLM is still just token predicting and isn’t bringing in a lot of extra context via RAG and using extra tokens for thinking to answer a query is just way out there.
> where the AI wrote some code some way and I had to ask why, it told me why
I just explained that it cannot tell you why. It's simply not how they work. You might as well tell me that it cooked you dinner and did your laundry.
> the code improves.
We can agree on this. The iterative process works. The understanding of it is incorrect. If someone's understanding of a hammer superficially is "tool that drives pointy things into wood", they'll inevitably try to hammer a screw at some point - which might even work, badly.
> It sounds like you either have access to bad models or you are just imagining what it’s like to use an LLM in this way
Quoting this is really enough. You may imagine me sighing.
> Also, pretending that the LLM is still just token predicting
Strawman.
Overall your comment is dancing around engaging with what is being said, so I will not waste my time here.
That is fine. You should, and you'll get the best results doing so.
>LLMs can't lie nor can they tell the truth. These concepts just don't apply to them
Nobody really knows exactly what concepts do and don't apply to them. We simply don't have a great enough understanding of the internal procedures of a trained model.
Ultimately this is all irrelevant. There are multiple indications that the same can be said for humanity, that we perform actions and then rationalize them away even without realizing it. That explanations are often if not always post-hoc rationalizations, lies we tell even ourselves. There's evidence for it. And yet, those explanations can still be useful. And I'm sure OP was trying to point out that is also the case for LLMs.
There are however limitations imposed by the architecture. An LLM cannot form secret chains of thought (though in theory a closed system outside the end-users' control could hide tokens from at least the user), nor can it model decent metacognition. They also have an at-best weak concept of fact vs fiction in general, which is why we get hallucinations. All of that isn't exactly optimal prerequisites for telling lies.
Also your car isn't a coward because it refuses to run into an obstacle onboard systems detect. The car's designers may have been cowards. Your car also isn't a hero for protecting you during a crash. Neither are LLMs virtuous or liars. If some AI company went out of their way to intentionally construct an LLM such that it outputs untruths, it's not the LLM that is lying to you, it's Open AI/Anthropic/whoever you're interacting with. You're using their system. They are responsible for what it does. If it tells untruths they may have automated the act of telling lies, but it's still them doing it.
> There are multiple indications that the same can be said for humanity, that we perform actions and then rationalize them away even without realizing it
I was hoping to get a response like yours, because I'm genuinely curious about where it leads.
I believe what you said is true in the general sense, where we solve easy problems subconsciously in parts of our brains dedicated to supporting the conscious mind, without then being able to explain how we did it.
However this is a lot less true for engineering tasks, which have a lot more active planning. Sometimes software development means just being a fancy constraint solver, finding a solution that works while applying some best practices. When pressed why one chose that particular solution, one might be tempted to post-hoc rationalize it as the best solution, even though it was just one that fit. But that's merely making it out more than it was, not taking away from the accomplishment of finding one that worked, which likely required some active thinking.
At the other end of the spectrum is making architectural decisions and thinking ahead as one creates something novel. I would be able to tell you why everything exists, especially if I merely added it in anticipation of something that will use it later. There's a ton of conscious planning that goes into these things.
Most coders are still turning over problems they're dealing with at work in their head when they're going to sleep late in the day. This is very much the opposite of solving problems subconsciously.