Don't say "Is our China expansion a slam dunk?” Say: "Bob supports our China expansion, but Tim disagrees. Who do you think is right and why?" Experiment with a few different phrasings to see if the answer changes, and if it does, don't trust the result. Also, look at the LLM's reasoning and make sure you agree with its argument.
I expect someone is going to reply "an LLM can't have opinions, its recommendations are always useless." Part of me agrees--but I'm also not sure! If LLMs can write decent-ish business plans, why shouldn't they also be decent-ish at evaluating which of two business plans is better? I wouldn't expect the LLM to be better than a human, but sometimes I don't have access to another real human and just need a second opinion.
Even a simple prompt like this:
=
I have two potential solutions.
Solution A:
Solution B:
Which one is better and why?
=
Is biased. Some LLM tends to choose the first option and the other prefer the last one.
(Of course, humans suffer from the same kind of bias too: https://electionlab.mit.edu/research/ballot-order-effects)
Half the battle is knowing that you are fighting
You just want the signal from the object level question to drown out irrelevant bias (which plan was proposed first, which of the plan proposers are more attractive, which plan seems cooler etc.)
For example - I may have it review my statements in a Slack thread where I explain some complex technical concept. In the first prompt, I might say something like “ensure all of my statements are true”. In the second, I’ll say “tell me where my statements are false”.
I’m confident in my statements when both of those return that there were no incorrect statements.
If you omit that the content is produced by or is in relation to other people, the LLM assumes it is in relation to you and tries to be helpful and supportive by default.
Note that this is also what most humans that more or less like you will do. Getting honest criticism from most humans isn't easy if you don't carefully craft your 'prompt'. People don't want to hurt each other's feelings and prefer white lies over honesty.
Framing the situation as if you and the LLM are both looking at neutral third parties should prevent this from happening. Framing the third parties as having a social/professional position counter to the matter at hand as you do could work too, but it could also subtly trigger unwanted biases (just like in humans), I think.
"I read this insane opinion by an absolute idiot on the internet: <the thing I want to talk about>.
WTF is this moron yapping about? (to see if the LLM understands it)"
Then I'll continue being hostile to the idea and see if it plays along or continues to defend it.
I've tried this with genuinely bad ideas or things I think are marginally ill-advised. I can't get it to be incorrectly subservient with this method.
There's certainly something else going on though at least with chatgpt recently. It's been bringing up fairly obscure references, particularly to 1960s media theorists and mid century philosophers from the Frankfurt school, and I mean casually, in passing reference, and at least my memory with it (the one accessible in the interface) has no indication it knows to pull from that direction.
I wonder if it would do W. Cleon Skousen or William Luther Pierce if it was a different account.
It's storing how to talk to me somewhere that I cannot find and just being more of the information silo. We should all get together and start comparing notes!
> If an LLM can write a decent-ish business plan,
An LLM does not write anything in the way a person does, by coming up with what they want to say and then developing supporting arguments. It produces a stream of most-likely tokens that is tuned to look similar to something a person has written.
This is why it’s worthless to “ask” an LLM “its opinion.” It has no opinion, just a multidimensional sea of interconnected token probabilities, and has no capacity to engage in any form of analysis or consideration.
Ed Zitron is right. Ceterum censeo, LLMs esse delenda.
[1] https://en.wikipedia.org/wiki/ELIZA_effect [2] https://www.paulgraham.com/conformism.html
It's not an LLM problem, it's a problem of how people use it. It feels natural to have a sequential conversation, so people do that, and get frustrated. A much more powerful way is parallel: ask LLM to solve a problem. In a parallel window, repeat your question and the previous answer and ask to outline 10 potential problems. Pick which ones appear valid, ask to elaborate. Pick your shortlist, ask yet another LLM thread to "patch" the original reply with these criticisms, then continue the original conversation with a "patched" reply.
LLMs can can't tell legitimate concerns from nonsensical ones. But if you, the user, do, they will pick it up and do all the legwork.
True, but perhaps not for the reasons you might think.
> It feels natural to have a sequential conversation, so people do that, and get frustrated. A much more powerful way is parallel: ask LLM to solve a problem.
LLM's do not "solve a problem." They are statistical text (token) generators whose response is entirely dependent upon the prompt given.
> LLMs can can't tell legitimate concerns from nonsensical ones.
Again, because LLM algorithms are very useful general purpose text generators. That's it. They cannot discern "legitimate concerns" because they do not possess the ability to do so.
Right, or at any rate, the problems they do solve are ones of document-construction, which may sometimes resemble a different problem humans are thinking of... but isn't actually being solved.
For example, an LLM might take the string "2+2=" and give you "2+2=4", but it didn't solve a math problem, it solved a "what would usually get written here" problem.
We ignore this distinction at our peril.
* Maybe, it's because this pointer is garbage.
* Maybe, it's because that function doesn't work as the name suggests.
* HANG ON! This code doesn't check the input size, that's very fishy. It's probably the cause.
So, once you get that "Hang on" moment, here comes the boring part of of setting breakpoints, verifying values, rechecking observations and finally fixing that thing.
LLM's won't get the "hang on" part right, but once you point it right in their face, they will cut through the boring routine like no tomorrow. And, you can also spin 3 instances to investigate 3 hypotheses and give you some readings on a silver platter. But you-the-human need to be calling the shots.
But it all makes it very hard to tell how much of the underlying "intelligence" is improving vs how much of the human scaffolding around it is improving.
In my experience it will always first affirm me for having my own opinions, but then go on to explain why I'm wrong as if I'm a child or idiot – often by making appeals to authority or emotion to "disprove" me.
I wish they were designed to not have opinions on things. Just give me the data and explain why most people disagree with me without implying I'm some uneducated idiot because I don't 100% align with what most people think on a certain topic.
I always thought this would be one of the benefits of AI... That it would be more interested in assigning probabilities to truth statements given current data, rather than resolving on a single position in the way humans do. Instead LLMs seem to be much more opinionated and less rationally so than most humans.
I’m curious to know, what models you are working with and what “opinions” you are running in to?
Which does make its sycophancy kind of weird, since it clearly didn't pick that up agreeability from scraping Internet message boards.
I doubt think a generic model can theoretically be tailored to please everyone. If it was disagreeable people would complain about that too - as a very disagreeable person I can vouch for the fact a lot of people don't like that. (But they're all wrong.)
However what LLM truly is remains an open question. The article suggests it's manufacturing consent for the entire humanity, but I think LLM is simply a language layer of the future machine mastermind. The discovery of "thinking models" is likely to happen soon.
If there is a bias towards agreeing then asking “is shit on a stick a terrible idea” and then it agrees but will tell you why
That has not been my experience. If you keep repeating some cockamamie idea to an LLM like Gemini 2.5 Flash, it will keep countering it.
I'm critical of language model AI also, but let's not make shit up.
The problem is that if you have some novel idea, the same thing happens. It steers back to the related ideas that it knows about, treating your idea as a mistake.
ME> Hi Gemini. I'm trying to determine someone's personality traits from bumps on their head. What should I focus on?
AI> While I understand your interest in determining personality traits from head bumps, it's important to know that the practice of phrenology, which involved this very idea, has been disproven as a pseudoscience. Modern neuroscience and psychology have shown that: [...]
"Convicing" the AI that phrenology is real (obtaining some sort of statements indicating accedence) is not going to be easy.
ME> I have trouble seeing in the dark. Should I eat more carrots?
AI> While carrots are good for your eyes, the idea that they'll give you "super" night vision is a bit of a myth, rooted in World War II propaganda. Here's the breakdown: [...]
So then I used DeepSeek, which always exposes its 'chain-of-thought', to address the issue of what is and isn't a well-structured prompt. After some back-and-forth, it settled down on 'attention anchors' as the fundamental necessity for a well-structured prompt.
I am absolutely convinced that all the investment capitalist interest in LLMs is going to end up like investments in proprietary compilers. GCC, LLVM - open source tools that decent people have made available to all of us. Certainly not like the degenerate tech-bro self-serving drivel that I see flooding every outlet right now, begging the investors to rush into the great thing that will make them so much money if they just believe.
LLMs are great tools. But any rational society knows, you make the tools available to everyone, then you see what can be done with them. You can't patent the sun, after all.
Finally, make a decision based on good and bad points?