>locally in 2-3 years, back in 2022.
That's pretty much accurate. Mine's been up and running on my clunky old home machine for 6 months, and I just this morning overheard a couple coworkers talking about the local llms they're running. Right now a substantial minority of PCs could run useful models. Usage is at the stage where it's small and growing rapidly. Early adapter phase, but all you need is a relatively modest years-old GPU to handle something like this model.
What isn't done, and probably won't be for a while, is a nice generic framework so we can tie their local llm into all sorts of local apps and processes. The big players all want us to use cloud services.