Now take into account that modern LLMs tend to use 4bit inference, and Blackwell is significantly more optimized for 4 bit, we can see much less than 11 cents. Maybe a speed up of 5x if using 4bit and Blackwell vs H100 and 8 bit?
So we're looking at potentially 2.2 cents per million tokens.
No comments yet.