Based on some rough ballpark conservative estimates (one server with 2 A100 at $50000; 50 tokens/s one one of those servers; so 10 of those servers), upfront cost with consumer hardware seems to be 1/10 to 1/20 of what the Groq hardware costs. I would guess that realistically cloud providers can probably achieve half to 1/3 of that price
So unless you need the fast latency of Groq, consumer hardware seems to be a lot cheaper for the same thoughput.