Here and on AI tech subreddits (ones that aren’t specifically about local or FOSS) seem to have this dynamic, to the degree I’ve suspected astroturfing.
So it’s refreshing to see maybe that’s just a coincidence or confirmation bias on my end.
Thanks!
It makes using my Claude Pro sub actually feasible: write a plan with it, pick it up with my local model and implement it, now I'm not running out of tokens haha.
Is it worth it from a unit economics POV? Probably not, but I bought this thing to learn how to deploy and serve models with vLLM and SGLang, and to learn how to fine tune and train models with the 128GB of memory it gets to work with. Adding up two 40GB vectors in CUDA was quite fun :)
I also use Z.ai's Lite plan for the moment for GLM-5.1 which is very capable in my experience.
I was using Alibaba's Lite Coding Plan... but they killed it entirely after two months haha, too cheap obviously. Or all the *claw users killed it.
Most recently I used it to develop a script to help me manage email. The implementation included interacting with my provider over JMAP, taking various actions, and implementing an automated unsubscribe flow. It was greenfield, and quite trivial compared to the codebases I normally interact with, but it was definitely useful.
At least, that's my theory.
The big advantages of local on a business level are:
- Freezing your model's exact settings once you've locked in some kind of workflow that works just fine. - Guarding against insane token usage from LLMs who have been told to never stop until they figure out the solution OR setting up an LLM run incorrectly. (The last one happened to me with Gemini 3.1 Pro) - PII or some need for on-premise only LLMs.
When I argue this, my point is that FOSS shouldn't target the desktop with open weights - it should target H200s. Really big parameter models with big VRAM requirements.
Those can always be distilled down, but you can't really go the other way.
The TL;DR is that unless you are doing it as a hobby or working in an environment where none of the data privacy options supported by Anthropic/OpenAI (including running on Azure/Bedrock with ZDR) work for you then it's not worth it.
The best open models are around the Sonnet 4.6 level. That's excellent, but the level of tasks you can give to GPT 5.4 or Opus 4.6 is just so much higher it doesn't compare (and Opus 4.7 seems noticeably better in my few hours of testing too).
I have my own benchmarks, but I like this much under-publicized OpenHands page: https://index.openhands.dev/home
It shows for every task they test closed models do the best. The closest and open model gets is Minmax 2.7 on issue resolution where it's ~1% worse than the leaders.
That matches my experience - fine for small problems, but well behind has the task gets bigger.