You hit the nail on why I say to much hatred from "AI Bros" as I call them, when I say it will not take off truly until it runs on your phone effortlessly, because nobody wants to foot a trillion dollar cloud bill.
Give me a fully offline LLM that fits in 2GB of VRAM and lets refine that so it can plug into external APIs and see how much farther we can take things without resorting to burning billions of dollars' worth of GPU compute. I don't care that my answer arrives instantly, if I'm doing the research myself, I want to take my time to get the correct answer anyway.