undefined | Better HN

0 pointsaurareturn4mo ago0 comments

I think they're also running this at 16 bit quant. If they lower it to 8bit, they might double their output which might come out to be 11 cents per million tokens.

Now take into account that modern LLMs tend to use 4bit inference, and Blackwell is significantly more optimized for 4 bit, we can see much less than 11 cents. Maybe a speed up of 5x if using 4bit and Blackwell vs H100 and 8 bit?

So we're looking at potentially 2.2 cents per million tokens.

0 comments

No comments yet.