Oh dear 14B and 4-bit quant? There are going to be a lot of embarrassed programmers who need to explain to their engineering managers why their Macbook can't reasonably run LLMs like they said it could. (This already happened at my fortune 20 company lol)
Replies like "but, but other laptops" are very weak attempts at deflection.
What is "it" and what didn't it do?
With some custom tooling, we built our own local enterprise setup:
Support ticketing system Custom chat support powered by our trained software-support model Resolved repository with detailed step-by-step instructions User-created reports and queries Natural language-driven report generation (my favorite — no more dragging filters into the builder; our (Secret) local model handles it for clients) In-application tools (C#/SQL/ASP.NET) to support users directly, since our software runs on-site and offline due to PPI A cool repair tool: import/export “support file packet patcher” that lets us push fixes live to all clients or target niche cases Qwen3 with LoRA fine-tuning is also incredible — we’re already seeing great results training our own models.
There’s a growing group pushing K2.5s to run on consumer PCs (with 32GB RAM + at least 9GB VRAM) — and it’s looking very promising. If this works, we’ll be retooling everything: our apps and in-house programs. Exciting times ahead!