>We weren’t there to rehash prompt engineering tips.
>We talked about context engineering, inference stack design, and what it takes to scale agentic systems inside enterprise environments. If “prompting” is the tip of the iceberg, this panel dove into the cold, complex mass underneath: context selection, semantic layers, memory orchestration, governance, and multi-model routing.
I bet those four people love that the moderator took a couple notes and then asked ChatGPT to write a blog post.
As always, the number one tell of LLM output, besides the tone, is that by default it will never include links in the body of the post.
Why can’t anyone be bothered anymore to write actual content, especially when writing about AI, where your whole audience is probably already exposed to these patterns in content day in, day out?
It comes off as so cheap.
The real insight: have some fucking pride in what you make, be it a blog post, or a piece of software.
> One panelist shared a personal story that crystallized the challenge: his wife refuses to let him use Tesla’s autopilot. Why? Not because it doesn’t work, but because she doesn’t trust it.
> Trust isn’t about raw capability, it’s about consistent, explainable, auditable behavior.
> One panelist described asking ChatGPT for family movie recommendations, only to have it respond with suggestions tailored to his children by name, Claire and Brandon. His reaction? “I don’t like this answer. Why do you know my son and my girl so much? Don’t touch my privacy.”
The way I see it is that the majority of people never bothered to write actual content. Now there’s a tool the non-writers can use to write dubious content.
I would wager this tool is being used much differently by actual writers focused on producing quality. There’s just way less of them, same way there is less of any specialization.
The real question with AI to me is whether it will remain consistently better when wielded by a specialist who has invested their time into whatever the thing is they are producing. If that ever changes then we are doomed. When it’s no longer slop…
"There’s a missing primitive here: a secure, portable memory layer that works across apps, usable by the user, not locked inside the provider. No one’s nailed it yet. One panelist said if he weren’t building his current startup, this would be his next one."
This isn't true. I've been using Gemini 2.5 a lot recently and I can't get it to stop adding links!
I added custom instructions: Do not include links in your output. At the start of every reply say "I have not added any links as requested".
It works for the first couple of responses but then it's back to loads of links again.
End users (at my company) - Can your AI system look at numbers and find differences and generate a text description?
Pre-sales - (trying to clarify) For our systems to generate text it will be better if you give it some live examples so that it understands what text to generate.
End users - But there is supporting data (metadata) around the numbers. Can't your AI system just generate text?
Pre-Sales - It can but you need to provide context and examples. Otherwise it is going to generic text like "there is x difference".
End user - You mean I need to write comments manually first? That is too much work.
Now these users have a call with another product - MS Copilot.
Anyone that's been involved in data science roles in corporate environments knows that "the data" is usually forced into an execs pre-existing understanding of a phenomenon. With AI, execs are really excited at "cutting out the middlemen" when the middlemen in the equation are very often their own paid employees. That's all fine and dandy in an abstract economic view, but it's sure something they won't say publicly (at least most won't).
In terms of potential cost cutting, it probably is the most recent "new magic". You used to have to pay a consultant, now you can "ask AI".
This is really the truth of all things in life.
This is how it is being marketed and I guess people are silly enough to believe marketing so it's not too surprising
That's because it is marketed as magic. It's marketed as magic so people will adopt the thing before knowing its shortcomings.
Text-to-SQL is the funniest example. It seems to be the "hello world" of agentic use in enterprise environments. It looks so easy, so clear, so straight-forward. But just because the concept is easy to grasp (LLMs are great at generating markup or code, so let's have them translate natural language to SQL) doesn't mean it is easy to get right.
I have spent the past 3 months building a solution that actually bridges the stochastic nature of AI agents and the need for deterministic queries. And boy oh boy is that rabbit hole deep.
60% of the time I spend writing sql is probably validation. A single hallucinated assumption can blow the whole query. And there are questions that don’t have clear modelling approaches that you have to deal with.
Plus, a lot of the sql training data in LLMs is pretty bad, so I’ve not been impressed yet. Certainly not to let business users run an AI query agent unchecked.
I’m sure AI will get good at this, so I’m building up my warehouse knowledge base and putting together documentation as best I can. It’s just pretty awful today.
A user having to come up with novel queries all the time to warrant text 2 sql is a failure of product design.
People got good results on the test datasets, but the test datasets had errors so the high performance was actually just the models being overfitted.
I don't remember where this was identified, but it's really recent, but before GPT-5.
Wait but this just sounds unhinged, why oh why
People don't know exactly what they want from the data warehouse, just a fuzzy approximation of it. You need stochastic software (AI) to map the imprecise instructions from your users to precise instructions the warehouse can handle.
For example, the number comes from perceived successes and failures and not actual measurements. The customer conclusions are also - it doesnt improve or it doesnt remember. Literally buying into the hype of recursive self improvement and completely oblivious to the fact that API dont control model weights and such cant do much self improvement besides writing more CRUD layers. The other complaints are about integrations which are totally valid. But in industries which still run windows XYZ without any API platforms so thats not going away in those cases.
Point being, if the paper itself is not very good discourse just a well marketed punditry, why should we discuss on the 5% number. It makes no sense.
I use LLM every day of my life to make myself highly productive. But I do not use LLM tools to replace my decision trees.
If we had tech support for a toaster, you might see:
if toaster toasts the bread:
if no: has turning it off and on again worked?
if yes: great! you found a solution
if no: hmm, try ...
if yes:
is the bread burnt after?
if no: sounds like your toaster is fine!
if yes: have you tried adjusting the darkness knob?
if no: ship it in for repair
if yes: try replacing the timer. does that help?
if no: ship it in for repair
if yes: yay you're toaster is fixedWithout context, even the brightest people will not be able to fill in the gaps in your requirements. Context is not just nice-to-have, it's a necessity when dealing with both humans and machines.
I suspect that people who are good engineering managers will also be good at 'vibe coding'.
I have observed that those who have both technical and management experience seem to be more adept (or perhaps willing?) to use LLMs in the daily life to good effect.
Of course what really helps, like in all things, is conscientiousness and an obsession for working through problems (if people don't like obsession then tenacity and diligence).
Here’s the reality check: One panelist mentioned that 95%
of AI agent deployments fail in production. Not because the
models aren’t smart enough, but because the scaffolding
around them, context engineering, security, memory design,
isn’t there yet.
Could be a reasonable definition of "understanding the problem to solve."In other words, everything identified as what "the scaffolding" needs is what qualified people provide when delivering solutions to problems people want solved.
If I implement myself a strict parser and an output post-processor to guard against hallucinations, I have done 100% of the business related logic. I can skip the LLM in the middle altogether.
Well said and I could not agree more.
You might even be able to put a UI on it that is a lot more effective than asking the user to type text into a box.
might as well just write the ai agent part of the software yourself as well.
So...
The bot, to its credit, returns some decent results. But my guess is that it will be quite a while before we see it in prod since a lot of these projects go from 0 - 80% in a week and 80% - deployable in several years.
And now we have an entire panel of bullshitters with an article-long theory about how to make LLMs program actually for real this time.
(Oh, and it would be great if journalists actually cited their public sources, instead of pretending they link to the article but actually linking to their review of related content.)
If you said it’s something you made for perusal and reading? Then it reads like AI.
I’ve had to read tons of papers and articles, the most testing being conference submissions. I won’t read something with that structure unless I have to.
So you scaffold this up in 30 seconds but want me to read through it carefully? Cool, thanks.
Okay, how would that work though? Verified by who and calculated by what?
I need deets.
So if you have a "CalculateQuarterRevenue(year, quarter)" function, you'll soon find your users asking for the data per-month. Or just for the last six weeks. Or just for a specific client. And they'll be confused when it doesn't work.
edit: I'm serious. I'm just answering the question, not making a value judgement.
Verbal queries is the solution for the world we have even if it's not optimal.
On the other side, you have an SQL that calculates the revenue
Compare the two. If the two disagree, get the AI to try again. If the AI is still wrong after 10 tries, just use the SQL output.
What I hear is a billion dollar AI startup in the making!
It's a big pet peeve of mine when an author states an opinion, with no evidence, as some kind of axiom. I think there is plenty of evidence that "the models aren't smart enough". Or to put it more accurately, it's an incredibly difficult problem to get a big productivity gain when an automated system is blatantly wrong ~1% of the time but when those wrong answers are inherently designed to look like right answers as much as possible.
Conversational UIs are controversial but I think there are a good number of websites where a better search could be more centric. Not generating text, but surfacing the most relevant text.
I’m thinking of a lot of library documentation, government info websites, etc. Basically an improvement over deep hierarchical navigation, where their way of organizing info is a leaky abstraction.
Maybe that will be one of the side effects of this AI boom. Who knows.
``` The teams that succeed don’t just throw SQL schemas at the model. They build:
Business glossaries and term mappings
Query templates with constraints
Validation layers that catch semantic errors before execution ```
Unfortunately, the mixing of fluffy tone and high level ideas is bound to be detested by hands on practitioners.
https://ai.meta.com/research/publications/cwm-an-open-weight...