undefined | Better HN

0 pointsnyeah1mo ago0 comments

Can you give a few penciled numbers?

0 comments

paulddraper1mo ago

You can rent a H100 GPU for $4/hour. [1]

300k tokens for that hour.

OpenAI charges $6.

Those are pessimistic assumptions.

[1] https://lambda.ai/instances

hajile1mo ago

Can you keep that GPU 100% saturated at least 16 hours per day every day of the week?

If not, you aren't breaking even.

paulddraper1mo ago

Note this is also assuming you

(1) Rent your GPUs.

(2) Pay list price, no volume breaks.

(3) Get only 85 tokens/sec. Realistically, frontier models would attain 200+ tokens/second amortized.

Inference is extremely profitable at scale.

aurareturn1mo ago

Assuming 80GB H100 and you inference a model that is MoE and close to the size of the 80GB VRAM, you're going to see around 10k tokens/second fully batched and saturated. An example here might be Mixtral 8x7B.

You're generating about 36 million tokens/hour. Cost of Mixtral 8x7b on Open router is $0.54/M input tokens. $0.54/M output tokens.

You're looking at potentially $38.88/hour return on that H100 GPU. This is probably the best case scenario.

In reality, inference providers will use multiple GPUs together to run bigger, smarter models for a higher price.

drakythe1mo ago

3.99 at 8x instances, with a minimum 2 week commitment. Good luck getting 70% usage average during that time. Useful when you're running a training round and can properly gauge demand, not so great when you're offering an API.

infecto1mo ago

Is it not a good penciled number? It helps set the directional tone that at inference cost is being covered.

drakythe1mo ago

It says the numbers are theoretically possible. Requiring a 66% usage to break even when 100% usage will piss off customers by invoking a queue means it’s a balancing act.

“Technically correct. The best kind of correct”. So inference may technically be _capable_ of being profitable, but I have question’s about them being profitable in _practice_.

j / k navigate · click thread line to collapse

0 comments

paulddraper1mo ago

You can rent a H100 GPU for $4/hour. [1]

300k tokens for that hour.

OpenAI charges $6.

Those are pessimistic assumptions.

[1] https://lambda.ai/instances

hajile1mo ago

Can you keep that GPU 100% saturated at least 16 hours per day every day of the week?

If not, you aren't breaking even.

paulddraper1mo ago

Note this is also assuming you

(1) Rent your GPUs.

(2) Pay list price, no volume breaks.

(3) Get only 85 tokens/sec. Realistically, frontier models would attain 200+ tokens/second amortized.

Inference is extremely profitable at scale.

aurareturn1mo ago

You're generating about 36 million tokens/hour. Cost of Mixtral 8x7b on Open router is $0.54/M input tokens. $0.54/M output tokens.

You're looking at potentially $38.88/hour return on that H100 GPU. This is probably the best case scenario.

In reality, inference providers will use multiple GPUs together to run bigger, smarter models for a higher price.

drakythe1mo ago

infecto1mo ago

Is it not a good penciled number? It helps set the directional tone that at inference cost is being covered.

drakythe1mo ago

It says the numbers are theoretically possible. Requiring a 66% usage to break even when 100% usage will piss off customers by invoking a queue means it’s a balancing act.

“Technically correct. The best kind of correct”. So inference may technically be _capable_ of being profitable, but I have question’s about them being profitable in _practice_.

j / k navigate · click thread line to collapse