300k tokens for that hour.
OpenAI charges $6.
Those are pessimistic assumptions.
If not, you aren't breaking even.
(1) Rent your GPUs.
(2) Pay list price, no volume breaks.
(3) Get only 85 tokens/sec. Realistically, frontier models would attain 200+ tokens/second amortized.
Inference is extremely profitable at scale.
You're generating about 36 million tokens/hour. Cost of Mixtral 8x7b on Open router is $0.54/M input tokens. $0.54/M output tokens.
You're looking at potentially $38.88/hour return on that H100 GPU. This is probably the best case scenario.
In reality, inference providers will use multiple GPUs together to run bigger, smarter models for a higher price.
“Technically correct. The best kind of correct”. So inference may technically be _capable_ of being profitable, but I have question’s about them being profitable in _practice_.