Unless you can run the LLM locally, on a computer you own, you are now completely dependent on a remote centralized system to do your work. Whoever controls that system can arbitrarily raise the prices, subtly manipulate the outputs, store and do anything they want with the inputs, or even suddenly cease to operate. And since, according to this article, only the latest and greatest LLM is acceptable (and I've seen that exact same argument six months ago), running locally is not viable (I've seen, in a recent discussion, someone mention a home server with something like 384G of RAM just to run one LLM locally).
To those of us who like Free Software because of the freedom it gives us, this is a severe regression.
This sounds a bit like bailing out the ocean.
In fact, MCP is so ground breaking that I consider it to be the actual meat and potatoes of coding AIs. Large models are too monolithic, and knowledge is forever changing. Better just to use a small 14b model (or even 8b in some cases!) with some MCP search tools, a good knowledge graph for memory, and a decent front end for everything. Let it teach itself based on the current context.
And all of that can run on an off the shelf $1k gaming computer from Costco. It’ll be super slow compared to a cloud system (like HDD vs SSD levels of slowness), but it will run in the first place and you’ll get *something* out of it.
It's not black magic anymore.
* Not even counting cellular data carriers, I have a choice of at least five ISPs in my area. And if things get really bad, I can go down to my local library to politely encamp myself and use their WiFi.
* I've personally no need for a cloud provider, but I've spent a lot of time working on cloud-agnostic stuff. All the major cloud providers (and many of the minors) provide compute, storage (whether block, object, or relational), and network ingress and egress. As long as you don't deliberately tie yourself to the vendor-specific stuff, you're free to choose among all available providers.
* I run Linux. Enough said.
* Hmm, what kind of software do you write that pays your bills?
* And your setup doesn't require any external infrastructure to be kept up to date?
Open source of course.
So what's my response to that deprecating? Maintaining it myself? Nope finding another library.
You always depend on something...
My company has set this up for one of our customers (I wasn't involved).
It's not off-grid, but that's the eventual dream/ goal.
True, but I think wanting to avoid yet another dependency is a good thing.
This actually to me implies the opposite of what you’re saying here. Why bother relearning the state of the art every few months, versus waiting for things to stabilize on a set of easy-to-use tools?
Folks that are local LLMs everyday now will probably say you can basically emulate at least Sonnet 3.7 for coding if you have an real AI workstation. Which may be true, but the time and effort and cost involved is substantial.
See the Microsoft ecosystem as an example. Nothing they do could not be replicated, but the network effects they achieved are strong. Too much glue, and 3rd party systems, and also training, and what users are used to, and what workers you could hire are used to, now all point to the MS ecosystem.
In this early mass-AI-use phase you still can easily switch vendors, sure. Just like in the 1980s you could still choose some other OS or office suite (like Star Office - the basis for OpenOffice, Lotus, WordStar, WordPerfect) without paying that kind of ecosystem cost, because it did not exist yet.
Today too much infrastructure and software relies on the systems from one particular company to change easily, even if the competition were able to provide a better piece of software in one area.
In 20 years, memory has doubled 32x
It means that we could have 16 TB memory computers in 2045.
It can unlock a lot of possibilities. If even 1 TB is not enough by then (better architecture, more compact representation of data, etc).
Still, I suppose that's better than what nvidia has on offer atm (even if a rack of gpus gives you much, much higher memory throughput).
In some cases it's more cost effective to get M-series Mac Minis vs nVidia GPUs
For the past few years, we've been "getting smaller" by getting deeper. The diameter of the cell shrinks, but the depth of the cell goes up. As you can imagine, that doesn't scale very well. Cutting the cylinder diameter in half doubles the depth of the cylinder for the same volume.
If you try to put the cells closer together, you start to get quantum tunneling where electrons would disappear from one cell and appear in another cell altering charges in unexpected ways.
The times of massive memory shrinks are over. That means we have to reduce production costs and have more chips per computer or find a new kind of memory that is mass producible.
The point being made here is that a developer that can only do their primary job of coding via a hosted LLM is entirely dependent on a third party.
You make a good point of course that independence is important. But primo, this ship sailed long ago, secundo, more than one party provides the service you depend on. If one failes you still have at least some alternatives.
It's fair to be worried about depending on LLM. But I find the dependance on things like AWS or Azure more problematic, if we are talking about centralized and proprietary
There are all kinds of trades that the car person and the non-car person makes for better or worse depending on the circumstance. The non-car person may miss out on a hobby, or not know why road trips are neat, but they don't have the massive physical and financial liabilities that come with them. The car person meanwhile—in addition to the aforementioned issues—might forget how to grocery shop in smaller quantities, or engage with people out in the world because they just go from point A to B in their private vessel, but they may theoretically engage in more distant varied activities that the non-car person would have to plan for further in advance.
Taking the analogy a step further, each party gradually sets different standards for themselves that push the two archetypes into diametrically opposed positions. The non-car owner's life doesn't just not depend on cars, but is often actively made worse by their presence. For the car person, the presence of people, especially those who don't use a car, gradually becomes over-stimulating; cyclists feel like an imposition, people walking around could attack at any moment, even other cars become the enemy. I once knew someone who'd spent his whole life commuting by car, and when he took a new job downtown, had to confront the reality that not only had he never taken the train, he'd become afraid of taking it.
In this sense, the rise of LLM does remind of the rise of frontend frameworks, bootcamps thay started with React or React Native, high level languages, and even things like having great internet; the only people who ask what happens in a less ideal case are the ones who've either dealt with those constraints first-hand, or have tried to simulate it. If you've never been to the countryside, or a forest, or a hotel, you might never consider how your product responds in a poor connectivity environment, and these are the people who wind up getting lost on basic hiking trails having assumed that their online map would produce relevant information and always be there.
Edit: To clarify, in the analogy, it's clear that cars are not intrinsically bad tools or worthwhile inventions, but had excitement for them been tempered during their rise in commodification and popularity, the feedback loops that ended up all but forcing people to use them in certain regions could have been broken more easily.
To be fair, the entire internet is basically this already.
Sure, but that is not the point of the article. LLMs are useful. The fact that you are dependent on someone else is a different problem like being dependent on microsoft for your office suite.
200-300$/month are already 7k in 3 years.
And I do expect some hardware chip based models in a few years like a GPU.
AiPU we're you can replace the hardware ai chip.
> 200-300$/month are already 7k in 3 years.
Except at current crazy rates of improvement, cloud based models will in reality likely be ~50x better, and you'll still have the same system.
2.5 years ago it could just about run LLaMA 1, and that model sucked.
Today it can run Mistral Small 3.1, Gemma 3 27B, Llama 3.3 70B - same exact hardware, but those models are competitive with the best available cloud-hosted model from two years ago (GPT-4).
The best hosted models (o3, Claude 4, Gemini 2.5 etc) are still way better than the best models I can run on my 3-year-old laptop, but the rate of improvements for those local models (on the same system) has been truly incredible.
I agree we will see how this plays out but I hope models might start to become more efficient and it might not matter that much for certain things to run some parts locally.
I could imagine a LLM model with a lot less languages and optimized for one programming language to happen. Like 'generaten your model'
Therefore using your own bare metal is a low of expensive redundancy.
For the cloud provider they can utilise the GPU to make it pay. They can also subsidise it with VC money :)
FOSS is more about:
1. Finding some software you can use for your problem
2. Have an issue for your particular use case
3. Download the code and fix the issue.
4. Cleanup the patch and send a proposal to the maintainer. PR is easy, but email is ok. You can even use a pastebin service and post it on a forum (suckless does that in part).
5. The maintainer merges the patch and you can revert to the official version, or they don't and you decides to go with your fork.
Self-hosting has always have a lot of drawbacks compared with commercial solutions. I bet my self-host file server has worse reliability than Google Drive, or my self-host git server has worse number of concurrent user than github.
It's one thing you must accept when self-host.
So when you self-host LLM, you must either accept a drop in output quality, or spend a small fortune on hardware
Raspberry pi was a huge step forward, the move to LLMs is two steps back.
Maven central is gone and you have no proxy setup or your local cache is busted? Poof, you’re fucking gone, all your Springs, Daggers, Quarkuses and every third party crap that makes up your program is gone. Same applies to bazillion JS, Rust libraries.
A guy says here you need 4TB for a PyPi mirror, 285 GB for npm
https://stackoverflow.com/questions/65995150/is-it-possible-...
We're not yet to that same point for performance of local LLM models afaict, though I do enjoy messing around them.