I am 100% all about using local models instead of sending someone else all my data and paying for the privilege of doing so, this article is misleading.
I can get a 27b model to kick out 40 tok/s on 16 gb vram. This is the area ripe for development.
If you can’t connect a monitor, it isn’t a standard GPU, at least not in the way people have spoken about GPUs until a few years ago.