I have been recently exploring and testing this approach of offloading the AI inference to the user instead of using a cloud API.
There are many advantages and I can see how this could be the norm in the future.
Also, I was surprised by the amounts of people that have GPUs and how well SLMs perform in many cases, even those with just 1B parameters.