Am I wrong to think that the answer is obvious? I mean, who wants web apps to behave differently every time you interact with them?
You or your coworker are not a web app. You can do some of the things that web apps can, and many things that a web app can't, but neither is because of the modality.
Coded determinism is hard for many problems and I find it entirely plausible that it could turn out to be the wrong approach in software, that is designed to solve some level of complex problems more generally. Average humans are pretty great at solving a certain class of complex problems that we tried to tackle unsuccessfully with many millions lines of deterministic code, or simply have not had a handle on at all, like (like build a great software CEO).
Talk about a nonsensical non-sequitur, but I’ll bite. People want those to be deterministic too, to a large extent.
When people cook a meal with the same ingredients and the same times and processes (like parameters to a function), they expect it to taste about the same, they never expect to cook a pizza and take a salad out of the oven.
When they have sex, people expect to ejaculate and feel good, not have their intercourse morph into a drag race with a clown half-way though.
And when they want a “solution”, they want it to be reliable and trustworthy, not have it shit the bed unpredictably.
Okay but when I start my car I want to drive it, not fuck it.
Are you suggesting that an average user would want to precisely describe in detail what they want, every single time, instead of clicking on a link that gives them what they want?
Websites are tools. Tools being non-deterministic can be a really big problem.
So even if it would be better to have more flexibility, most business won't want it.
And don't even try to claim there won't ever be any regression: Current LLM-based A.I. will 'happily' lie to you that they passed all tests -- because based on interactions in the past, it has.
For a typical user today’s software isn’t particularly deterministic. Auto updates mean your software is constantly changing under you.
LLMs being inherently non-deterministic means using this technology as the foundation of your UI will mean your UI is also non-deterministic. The changes that stem from that are NOT from any active participation of the authors/providers.
This opens a can of worms where there will always be a potential for the LLM to spit out extremely undesirable changes without anyone knowing. Maybe your bank app one day doesn't let you access your money. This is a danger inherent and fundamental to LLMs.
The LLM example gives you a completely different UI on _every_ page load.
That’s very different from companies moving around buttons occasionally and rarely doing full redesigns
1. Outputting text (or, sometimes, images).
2. No long term storage except, rarely, closed-source "memory" implementations that just paste stuff into context without much user or LLM control.
This is a really neat glimpse of a future where LLMs can have much richer output and storage. I don't think this is interesting because you can recreate existing apps without coding... But I think it's really interesting as a view of a future with much richer, app-like responses from LLMs, and richer interactions — e.g. rather than needing to format everything as a question, the LLM could generate links that you click on to drill into more information on a subject, which end up querying the LLM itself! And similarly it can ad-hoc manage databases for memory+storage, etc etc.
LLM is just one model used in A.I. It's not a panacea.
For generating deterministic output, probably a combination of Neural Networks and Genetic Programming will be better. And probably also much more efficient, energy-wise.
Not quite the case. Temperature 0 is not the same as random seed. Also there are downsides to lowering temperature (always choosing the most probable next token).
We absolutely should want developers to think.
Technically everyone, we stopped using static pages a while ago.
Imagine pages that can now show you e.g. infinitely customizable UI; or, more likely, extremely personalized ads.
Product owners were happy.
Until users came for us with pitchforks as they didn’t want stuff to change constantly.
We backed out to releasing on monthly cadence.
When I go to the dmv website to renew my license, I want it to renew my license every single time
Yeah, NO.
The hard part (coming from this direction) is enshrining the translation of specific user intentions into deterministic outputs, as others here have already mentioned. The hard part when coming from the other direction (traditional web apps) is responding fluidly/flexibly, or resolving the variance in each user's ability to express their intent.
Stability/consistency could be introduced through traditional mechanisms: Encoded instructions systematically evaluated, or, via the LLMs language interface, intent-focusing mechanisms: through increasing the prompt length / hydrating the user request with additional context/intent: "use this UI, don't drop the db."
From where I'm sitting, LLMs provide a now modality for evaluating intent. How we act on that intent can be totally fluid, totally rigid, or, perhaps obviously, somewhere in-between.
Very provocative to see this near-maximum example of non-deterministic fluid intent interpretation>execution. Thanks, I hate how much I love it!
I thought this didn't work? You basically end up fitting your AI models to whatever is the internal evaluation method, and creating a good evaluation method most often ends up having a similar complexity as creating the initial AI model you wanted to train.
Maybe the browser should learn to talk back.
You could store the pages in the database and periodically generate a new version based on the current set of pages and the share of traffic they enjoy. You would get something that evolves and stabilizes in some niche. Have an innitial prompt like; "dinosaurs!" Then sit back and see the magic unfold.
Why would I need programs with colors, buttons, actual UI ?
I am trying to imagine a future where file navigators don't even exist : "I want to see the photos I took while I was in vacations last year. Yes, can you remove that cloud ? Perfect, now send it to XXXX's computer and say something nice."
"Can you set some timers for my sport session, can you plan a pure body weight session ? Yes, that's perfect. Wait, actually, remove the jumping jacks."
"Can you produce a detroit style techno beat I feel like I want to dance."
"I feel life is pointless without a work, can you give me some tasks to achieve that would give me a feeling of fulfillment ?"
"Can you play an arcade style video game for me ?"
"Can you find me a mate for tonight ? Yes, I prefer black haired persons."
Better yet, why exercise -which is so repetitive- if we can create a machine that just does it for you, including the dopamine triggering, why play an arcade video game where we can create a machine that fires the neuron needed to produce the exact same level of a excitement than the best video game.
And why find mates when my robot can morph into any woman in the world, or better yet, the brain implants that trigger the exact same feelings than having sex and love.
Bleak, we are oversimplifying existence itself and it doesn't lead to a nice place.
"Make me happy"
"Make me happy"
"Make me happy"
We are already on this path for many-many years, certainly decades if not centuries, although availability was definitely spotty in the past.
It is also kind of impossible to hop off this train, while it is individually possible to reject any of these conveniences, in general they just become a part of life. Which is not necessarily a bad thing, but just different.
Contextualized to "web-apps," as you have; navigating a list maybe requires an interface. It would be fairly tedious to differentiate between, for example, the 30 pairs of pants your computer has shown you after you asked "help me buy some pants" without using a UI (ok maybe eye-tracking?).
We just gloss over the details in these hypothetical irregular or abstract tasks because we imagine they would be done as we imagine them. We don’t have experience trying to tell the damn AI to not delete that cloud (which one exactly?) but the other one via a voice UI. Which would suck and be super irritating, btw.
We know how irritating it would be to turn the shower off/on, because we do that all the time.
As for repetitive tasks, you can just explain to your computer a "common procedure" ?
No matter how capable the friend it, it's oftentimes easier to do a task directly in a UI rather than to have to verbalize it to someone else.
And there are people that unfortunately cannot speak.
> “That’s right,” I said, “or even worse, it could be perfect.”
-- William Gibson: The Gernsback Continuum
I realize it sounds inhuman, but so is working in enterprise IT! :)
Growing the food that a human eats, running the air conditioning for their home, powering their lights, fueling their car, charging their phone, and all the many many things necessary to keep a human alive and productive in the 21st century are a larger resource cost than almost any machine/system that performs the same work. From an efficiency perspective, automation is almost always the answer. The actual debate comes from the ethical perspective (the innate value of human life).
Multiply that by dozens or hundreds of self-updating programs on a typical machine. Absolutely insane amounts of resources.
Some ideas - use a slower 'design' model at startup to generate the initial app theme and DB schema and a 'fast' model for responses. I tried a version using PostREST so the logic was in entirely in the DB and but then it gets too complicated and either the design model failed to one-shot a valid schema or the fast model kept on generating invalid queries.
I also use some well known CSS libraries and remember previous pages to maintain some UI consistency.
It could be an interesting benchmark or "App Bench". How well can an LLM one-shot create a working application.
What if we ran AI locally and used it to actually do labor-intensive things with computers that make money rather than assuming everything were web-connected, paywalled, rate-limited, authenticated, tracked, and resold?
I don't see a point in using probabilistic methods to perform a deterministic logic. Even if it's output is correct, it's wasteful.
But there is a kicker here. It is upto LLM to discover the right abstractions for “thinking” while serving the requests directly or in the code .
Coming up with the right abstraction is not a small thing. Just see what git is over cvs - without git no one would have even imagined micro services. The right abstraction cuts through the problem, not just now, but in the future too. And that can only happen if the LLM/AI managing the app is really smart and deal with real world for a long time and make the right connection - these insights don’t even come to really smart people that easily!
LLM is just a tool in the A.I. world. There are lots of other A.I. tools, such as Neural Network, Fuzzy Logic, Genetic Programming, and so on.
I did a POC for this in July - https://www.ohad.com/2025/07/10/voidware/
On the one hand, there’s „classical“ software that is developed here and deployed there — if you need a change, you need to go over to the developers, ask for a change & deploy, and thus get the change into your hands. The work of the developers might be LLM-assisted, but that doesn’t change the principle.
The other extreme is what has been described here, where the LLM provides the software „on the fly“.
What I‘m imagining is a software, deployed on a system and provided in the usual way — say, a web application for managing inventory.
Now, you use this software as usual.
However, you can also „meta-use“ the software, as in: you click a special button, which opens a chat interface to an LLM.
But the trick is, you don’t use the LLM to support your use case (as in „Dear LLM, please summarize the inventory“).
Instead, you ask the LLM to extend the software itself, as in: „Dear LLM, please add a function that allows me to export my inventory as CSV“.
The critical part is what happens behind the scenes: the LLM modifies the code, runs quality checks and tests, snapshots the database, applies migrations, and then switches you to a „preview“ of the new feature, on a fresh, dedicated instance, with a copy of all your data.
Once you are happy with the new feature (maybe after some more iterations), you can activate/deploy it for good.
I imagine this could be a promising strategy to turn users into power-users — but there is certainly quite some complexity involved to getting it right. For example, what if the application has multiple users, and two users want to change the application in parallel?
Nevertheless, shipping software together with an embedded virtual developer might be useful.
CEO stops reading, signs a contract, and fires all developers.
> It's just catastrophically slow, absurdly expensive, and has the memory of a goldfish.
Reality sinks in two months later.
With a system prompt like “you’re an http server for a twitter clone called Gwitter.” you can interact directly with the LLM from a browser.
Of course it was painfully slow, quickly went off the rails, and revealed that LLM’s are bad at business logic.
But something like this might be the future. And on a longer time horizon, mentioned by OP and separately by sama, it may be possible to render interactive apps as streaming video and bypass the browser stack entirely.
So I think we’re a the Mother of All Demos stage of things. These ideas are in the water but not really practical today. Similarly to MoaD, it may take another 25 years for them to come to fruition.
On the other hand, improvements to "AI" of similar scales are very much uncertain. We have seen moderate improvements from brute force alone, i.e. by throwing more data and compute at the problem, but this strategy has reached diminishing returns, and we have been at a plateau for about a year now. We've seen improvements by applying better engineering (MCP, "agents", "skills", etc.), but have otherwise seen the same tech demos in search of a problem, with a bit more polish at every iteration.
There's no doubt that statistical models are a very useful technology with many applications, some of which we haven't discovered yet. But given the technology we have today, the claim that something like it could be used to generate interactive video which could be used instead of traditional software is absurd. This is not a matter of gradual iterations to get there—it would require foundational breakthroughs to work even remotely reliably, which is as uncertain as LLMs were 10 years ago.
In any case, whatever sama and his ilk have to say about this topic is hardly relevant. These people would say anything to keep the hype-driven valuation pump going.
These models can’t even do continuous learning yet. There’s no evidence that the current tech will ever evolve beyond what it is today.
Not to mention that nobody is asking for any of this.
(I also just thought of that episode about Moriarty, a Holodeck character, taking over the ship by tricking the crew. It doesn't seem quite so far-fetched anymore!)
I did a version of this where the AI writes tools on the fly but gets to reuse them on future calls, trying to address the cost / performance issues. Migrations are challenging because they require some notion of an atomic update across the db and the tools.
This is a nice model of organically building software on the fly and even letting end users customize it on the fly.
I guess any user can just run something /api/getdatabase/dumppasswords and it will give any user the passwords?
or /webapp?html=<script>alert()</script> and run arbitrary JS?
I'm surprised nobody mentioned that security is a big reason not to do anything like this.
I'm not entirely sure why I had an urge to write this.
When the god rectangle fails, there is literally nobody on earth who can even diagnose the problem, let alone fix it. Reasoning about the system is effectively impossible. And the vulnerability of the system is almost limitless, since it’s possible to coax LLMs into approximations of anything you like: from an admin dashboard to a sentient potato.
“zero UI consistency” is probably the least of your worries, but object permanence is kind of fundamental to how humans perceive the world. Being able to maintain that illusion is table stakes.
Despite all that, it’s a fun experiment.
For me it is predictability. I am a big proponent of AI tools. But even the biggest proponents admit that LLMs are non-deterministic. When you ask a question, you are not entirely sure what kind of answers you will get.
This behavior is acceptable as a developer assistance tool, when a human is in the loop to review and the end goal is to write deterministic code.
Whereas that sort of evaluation is trivial with code (even if at times program execution is non-deterministic), because its mechanics are explainable. Things like only testing boundary conditions hinge on this property, but completely fall apart if it’s all probabilistic.
Maybe explainable AI can help here, but to be honest I have no idea what the state of the art is for that.
Kind of like saving a game before taking on a boss. If things go haywire, just reload. Or maybe like cooking? If something went catastrophically wrong, just throw it out and start from the beginning (with the same tools!)
And I think the only way to even halfway mitigate the vulnerability concern is to identify that this hypothetical system can only serve a single user. Exactly 1 intent. Totally partitioned/sharded/isolated.
If you were using your own model you could maybe try to retrain/finetune the issues away given a new dataset and different techniques? But at that point you’re just transmuting a difficult problem into a damn near impossible one?
LLMs can be miraculous and inappropriate at the same time. They are not the terminal technology for all computation.
I think part of the issue is that most frameworks really suck. Web programming isn't that complicated at its core, the overengineering is mind boggling at times.
Thinking in the limit, if you have to define some type of logic unambiguously, would you want to do it in English?
Anyway, I'm just thinking out loud, it's pretty cool that this works at all, interesting project!
What these LLMs continue to prove those is they are no substitute for real domain knowledge. To date, I've yet to have a model implement RAFT consensus correctly in testing to see if they can build a database.
The way I interact with these models is almost adversarial in nature. I prompt them with the bare minimum that a developer might get in a feature request. I may even have a planning session to populate the context before I set it off on a task.
The bias in these LLMs really shines through an proves their autocomplete properties when they have a strong bias towards changing the one snippet of code I wrote because it doesn't fit in how it's training data would suggest the shape of it's code should be. Most models will course correct with instructions that they are wrong and I am right though.
One thing I've noted is that if you let it generate choices for you from the start of a project, it will make poor choices in nearly every language. You can be using uv to manage a python project and it will continue to try using pip or python commands. You can start an electron app and it will continuously botch if it's using commonjs or some other standard. It persistently wants to download go modules before coding instead of just writing the code and doing `go mod tidy` after (it literally doesn't need the module in advance, it doesn't even have tools to probe the module before writing the code anyway).
RAFT consensus is my go-to test because there is no 1 size fits all way for you to implement it. It might get an in-memory key store system right, but what if you want it to organize etcd/raft/v3 in a way that you can do multi-group RAFT? What if you need RAFT to coordinate some other form of data replication? None of these LLMs can really do it without a lot of prep work.
This is across all the models available from OpenAI, Claude, and Google.
Would be cooler if support for local llms was added. Currently only has support for anthropic and openai. https://github.com/samrolken/nokode/blob/main/src/config/ind...
Each person gets their own cache. The format of the cache is a git repo tied to their sessionid. Each time a request is made it writes the code, html, CSS, and database to git and commits it. Over time you build more and more artifacts and fewer things need to get generated JIT. Should also help with stability.
Let's say, in the future, when AI learns how to build houses, every time I want to sleep, I'll just ask the AI to build a new house for me, so I can sleep. I guess it will have to repurpose the old one, but that isn't my concern, it's just some implementation detail.
Wouldn't that be nice?
Every night, new house?
I guess there are many of us out there with these same thoughts/ideas and you've done an awesome job articulating and implementing it, congrats!
I’ve been reading through all the comments and the range of responses is really great and I'm so thankful for everyone to take the time to comment... from from “this is completely impractical” to “but what if we cached the generated code?” to “why would anyone want non-deterministic behavior?” All valid! Though I think some folks are critiquing this as if I was trying to build something production-ready, when really I was trying to build something that would break in instructive ways.
Like, the whole point was to eliminate ALL the normal architectural layers... routes, controllers, business logic, everything, and see what happens. What happens is: it’s slow, expensive, and inconsistent. But it also works, which is the weird part. The LLM designed reasonable database schemas on first request, generated working forms from nothing but URL paths, returned proper JSON from API endpoints. It just took forever to do it. I kept the implementation pure on purpose because I wanted to see the raw capabilities and limitations without any optimizations hiding the problems.
And honestly? I came away thinking this is closer to viable than it should be. Not viable TODAY. Today it’s ridiculous. But the trajectory is interesting. I think we’re going to look back at this moment and realize we were closer to a real shift than we thought. Or maybe not! Maybe code wins forever. Either way, it was a fun weekend. If anyone wants to discuss this or work on projects that respond faster than 30 seconds per request, I’m available for full stack staff engineer or tech co-founder work: sam@samrolken.org or x.com/samrolken
This project could use something like that. Perhaps ask the LLM to implement a way to store/cache the snapshots of its previous answers. That way, the more you use it, the faster it becomes.
Once the dust settles prices will go up. Even if running models will be cheaper they will need to earn back all the burned cash.
I’d much rather vibe code app get the code to run on some server.
I can get gpt 3 level of quality with qwen 8B, even qwen 4B in some cases
Because there are times when you use code in order to generate content. For instance, a complicated document in a content creating documentation. (Anything: graphics, music, corporate documents, ...).
Suppose that, on the spot, AI writes you a software suite in which you create a document.
Do you dare throw that suite away, hoping that AI will write a compatible one tomorrow which can still open and correctly handle all details of that complex document?
Now what if you ask it to optimize itself? Instead of just:
prompt: `Handle this HTTP request: ${method} ${path}`,
Append some simple generic instructions to the prompt that it should create a code path for the request if it doesn't already exist, and list all existing functions it's already created along with the total number of times each one has been called, or something like that.Even better, have it create HTTP routings automatically to bypass the LLM entirely once they exist. Or, do exponential backoff -- the first few times an HTTP request is called where a routing exists, still have the LLM verify that the results are correct, but decrease the frequency as long as verifications continue to pass.
I think something like this would allow you to create a version that might then be performant after a while...?
In fact, this thought has been percolating in the back of my mind but I don't know how to process it:
If LLMs were perfectly deterministic - e.g. for the same input we get the same output - and we actually started memoizing results for input sets by materializing them - what would that start to resemble?
I feel as though such a thing might start to resemble the source information the model was trained on. The fact that the model compresses all the possibilities into a limited space is exactly what makes it more valuable - instead of having to store every input, function body and outputs by memoizing that an LLM could generate, it just stores the model.
But this blows my mind somehow because if we DID store all the "working" pathways, what would that knowledgebase effectively represent and how would intellectual property work anymore in that case?
Thinking about functional programming, to me the potential to think of the LLM as the "anything" function, where a deterministic seed and input always produces the same output, with a knowledgebase of pregenererated outputs to use to speed up the retrieval of acceptable results for a given seed and set of inputs.... I can't put my finger on it.. is it a basically just a search engine then?
Let me try another way...
If I have a ask an LLM to generate a function for "what color is the fruit @fruit?", where fruit is the variable, and I memoize that @fruit = banana + seed 3 is "yellow", then the set of the prompt, input "@fruit", seed = 3, output = "yellow"... then this is now a fact that I could just memoize.
Would that be faster to retrieve the memoized result than calculating the result via the LLM?
And, what do we do with the thought that that set of information is "always true" with regards to intellectual property?
I honestly don't know yet.
1、If code generation eventually works without human intervention, and every Google search could theoretically produce a real-time, custom-generated page, does that mean we no longer need people to build websites at all? At that point, “web development” becomes more like intent-shaping rather than coding.
2、I’m also not convinced that chat is the ideal interface for users. Natural language feels flexible, but it can also be slow, ambiguous, and cognitively heavier than clicking a button. Maybe LLM-driven systems will need new UI models that blend conversation with more structured interaction, instead of assuming chat = the future.
Curious how others here think about those two points.
The only value of an LLM generating a realistic HTML page as an answer is to make it appear as though the answer was found on a preexisting page, lending the answer some level of validity.
If users really are fine with the LLM just generating the answer on the fly, doing so in HTML is completely unnecessary. Just give the user answers in text form.
What people want isn’t code - they want computers to do stuff for them. It just happens that right now, code is the best way you can do it.
The paradigm WILL change. It’s really just a matter of when. I think the point you make that these are problems of DEGREE, not problems of KIND is very important. It’s plausible, now it’s just optimization, and we know how that goes and have plenty of history to prove we consistently underestimate the degree to which computation can get faster and cheaper.
Really cool experiment!
I certainly wouldn't want a patient healthcare system that might return slightly different results, or store the data in different format each time you make a request. Code is and will continue to be the best way to build deterministic computer information systems, regardless of whether it's generated by humans or AI.
So guess kind of like v0
And it reminded me a little about NeuralOS, which appeared here a couple months ago [1]. NeuralOS is different though as they decided to just skip the UI part, too, and let the UI generate based on intent.
Maybe together with your approach we can finally reproduce all the funny holodeck bugs from Star Trek!
Somehow it will also help you decide what is needed as an mvp. Instead of building everything you think you will need, you get only what you need. But if I use someone elses application running this repo, the first thing I'll do is go to /admin/users/all
I'm using a similar approach in an app I'm building. Seeing how well it works, I now really believe that in the coming years we'll see a lot of "just-in-time generation" for software.
If you haven't already, you should try using qwen-coder on Cerebras (or kimi-k2 on Groq). They are _really_ fast, and they might make the whole thing actually viable in terms of speed.
Also I wonder if eventually you could go further and skip the LLM entirely and just train a game world frame generator on productivity software.
Similar fun concept as the cataclysm library for Python: https://github.com/Mattie/cataclysm
[1]: https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSe...
Why not try it out, and if it doesn't work for you or creates more work for you, then ditch it. All these AI assist tools are just tools.
But pretty consistently, such claims are met with accusations of not having tried it correctly, or not having tried it with the best/newest AI model, or not having tried it for long enough.
Thus, it seems that if you don't agree with maximum usage of AI, you must be wrong and/or stupid. I can understand that fostering feeling the need to criticize AI rather than just opt out.
(And if I enjoyed being gaslighted, I'd just be using the LLMs in the first place.)
I believe everyone has to deal with that, AI or not. There are bad human coders.
I've done integration for several years. There are integrations done with tools like Dell Boomi (no-code/low-code) that work but are hard to maintain, like you said. But what can you do? Your employer uses that tool to get it running until it can't anymore, as most no-code/low-code tools can get you to your goal most of the time. But when there's no "connector" or third-party connector that costs an arm and a leg, or hiring a Dell Boomi specialist to code that last mile, which will also cost an arm or a leg, then you turn to your own IT team to come up with a solution.
It's all part of IT life. When you're not the decision-maker, that's what you have to deal with. I'm not going to blame Dell Boomi for making my work extra hard or whatnot. It's just a tool they picked.
I am just saying that a tool is a tool. You can see many real life examples where you'll be pressured into using a tool and maintaining something created by such a tool, and not just in IT but in every field.
Even if LLMs do get 10x as fast, that's not even remotely enough. They are 1e9 times as compute intensive.
The Internet took something that used to be slow, cumbersome, expensive and made it fast, efficient, cheap.
Now we are doing it again.
I am a big proponent of AI. To me, this experiment mostly shows how not to use AI.
I'm looking for users who want to be co-owners of the platform. It supports pretty much any feature you may need to build complex applications including views/filtering, indexing (incl. Support for compound keys), JWT auth, access control. Efficient real-time updates. It's been battle tested with apps with relatively advanced search requirements.
Why?
Probably because of cost and speed. Imagine asking a tool to get a list of your Amazon orders. This experiment shows it might code a solution and execute it and come back to you in 60 seconds. You cannot rely on the results because LLMs are non-deterministic. If you use a thinking model like GPT-5, the same might take 10 minutes to execute and you still cannot rely on the results.
Here you’re paying for decreased upfront effort with per-request cost and response time (which will go down in future for sure). Eventually the cost and response time will both be low enough that it’s not worth the upfront effort of coding the solution. Just another amazing outcome of technology being on a continual path of improvement.
But “truly no-code” can never be deterministic - even though it’ll get close enough in future to be indistinguishable. And it’ll always be an order of magnitude less efficient than code.
This is why we have LLMs write code for us: they’re codifying the deterministic outcome we desire.
Maybe the best solution is a hybrid: after a few requests the LLM should just write code it can use to respond every time from then on.
Of course the generated code might not work in all cases or scenarios, or may have to be generated multiple times, and yes it would be slower the first time.. but subsequent invocation would just be the code that was generated.
I'm trying to imagine what this looks like practically.. it's a system that writes itself as you use it? I feel like there is a thread to tug on there actually.
I mean, I'll do the stuff I'm confident I can do, because I already can.
I'll let the AI do the stuff where I'm confident it can't fuck shit up.
I tried Xcode's built-in ChatGPT integration and Claude for some slightly-above-basic stuff that I already knew how to do, and they suggested some horribly inefficient ways of doing things and outdated (last year) APIs.
On the other hand, what I presume is Xcode's local model is nice for a sort of parameterized copy/paste or find/replace though: Slightly different versions of what I've already written, to reduce effort on bothersome boilerplate that can't be eliminated.
Where it breaks down is in the repeatability of experience from user user. It needs to have instructions that define the expectations of user experience across many people. Which ends up being a spec in code or code as spec.
Imagine if your door were to be generated every time you used it. The doorknob, key, even hinges would be different each time.
Ultimately, it is a new way to provide functionality but doesn’t quite remove all the code.
I can't see myself telling a client who pays millions a year that their logo sometimes will be in one place and sometimes in another.
but you are still generating code....?
Because we live on a planet with finite resources and running certain problems in an LLM is probably one of the most computationally expensive ways of solving them?
Usually I have to wait for the company running the API to push breaking changes without warning.
Abstractly, who cares what format the information is shared in? If it is complete, the rigidity of the schema *could* be irrelevant (in a future paradigm). Determinism is extremely helpful (and maybe vitally necessary) but, as I think this intends to demonstrate, *could* just be articulated as a form of optimization.
Fluid interpretation of API results would already be useful but is impossibly problematic. How many of us already spend meaningful amounts of time "cleaning" data?
Sure an LLM could write it's own libraries and abstractions in a low level language, and im sure there are some assembler or c level web api wrappers, but they would be nowhere near as comprehensive or battle tested as the ones available for high level languages.
This could definitely change in the future. I think we need a coding platform that is designed for optimised LLM use, but that still allows humans to understand and write it. Kind of a markdown for code. Sort of like what OP is trying to do, but with the built in benefit of having a common shared suite of tools for interoperability.
Ultimately useless layers of state that the goal you set out to test for inevitably complicates the process.
In chip design land we're focused on streamlining the stack to drawing geometry. Drawing it will be faster when the machine doesn't have decades of programmer opinions to also lose cycles to the state management.
When there are no decisions but extend or delete a bit of geometry we will eliminate more (still not all) hallucinations and false positives than we get trying to organize syntax which has subtly different importance to everyone (misunderstanding fosters hallucinations).
Most software out there is developer tools, frameworks, they need to do a job.
Most users just want something like automated Blender that handles 80% of an ask (look like a word processor or a video game) they can then customize and has a "play" mode that switches out of edit mode. That’s the future machine and model we intend to ship. Fonts are just geometric coordinates. Memory matrix and pixels are just geometric coordinates. The system state is just geometric coordinates[1].
Text driven software engineering modeled on 1960-1970s job routines, layering indirection on math states in the machine, is not high tech in 2025 and beyond. If programmers were car people they would all insist on a Model T being the only real car.
Copy-paste quote about never getting one to understand something when their paycheck depends on them not understanding it.
Intelligence gave rise to language, language does not give rise to intelligence. Memorization and a vain sense of accomplishment that follows is all there is to language.
[1]https://iopscience.iop.org/article/10.1088/1742-6596/2987/1/...