You can get something pretty fast right now with a Cerebras Coder subscription, sadly I think the best model they had last I checked was the somewhat dated GLM 4.7:
https://inference-docs.cerebras.ai/models/overviewI feel like if they got DeepSeek V4 Flash and Pro running on their hardware, even if at less than 1000 tok/s, they’d still be crushing it with any subscription they’d provide, given how generous their token limits were.