undefined | Better HN

0 pointslukev1y ago0 comments

Model training is what costs so much. I would expect OpenAI makes a profit on inference services.

0 comments

Running models locally brings my beefy rig to the knees for about half a minute for each querry for smaller models. Answering querries has to be expensive too?

dartos1y ago

The hardware required is the same, just in different amounts.

It’s less (gross) expensive for inference, since it takes less time, but the cost of that time (per second) is the same as training.

lukevOP1y ago

Obviously, that's my point.

We can do the math. GPT-4o can emit about 70 tokens a second. API pricing is $10/million for output tokens and $2.5/million for input tokens.

Assuming a workload where inputs tokens are 10:1 with output tokens, and that I can generate continuous load (constantly generating tokens). I'll end up paying $210/day in API fees, or $76,650 in a year.

Let's assume the hardware required to service this load is a rack of 8 H100s (probably not accurate, but likely in the ballpark.). That cost $240k.

So the hardware would pay for itself in 3 years. It probably has a service life of about double that.

Of course we have to consider energy too. Each H100 is 700watts, meaning our rack is 5.6 kilowatts, so we're looking at about 49 megawatt-hours to operate for the year. Let's assume they pay wholesale electricity prices of $50/mwh (not unreasonable), and you're looking at a ~$2,500 annual energy bill.

So there's no reason to think that inference alone isn't a profitable business.

semanticist1y ago

That doesn't sounds like brilliant margins, to be honest. You've left out the entire "running a business" costs, plus the model training costs. They need to pay their staff, offices, and especially lawyers (for all the lawsuits over the scraped content used to train the models).

It's not unusual for a startup to not be profitable, and they're obviously not as the company doesn't make a profit, but I'm not sure why isolating one aspect of their business and declaring it profitable would justify the idea that this company is inevitably a good investment "even if the company went defunct tomorrow".

Perhaps you meant "win" in the sense of "being influential" or something, but I'm pretty sure the people who invested billions of dollars use definitions that involve more concrete returns on their investment.

lukevOP1y ago

Oh they are 100% losing money hand over fist if you include training costs and the eye-watering salaries they pay some of their employees.

I was responding to someone upthread suggesting that they were running even inference at a loss.

neonbjb1y ago

You're missing the fact that requests are batched. It's 70 tokens per second for you, but also for 10s-100s of other paying customers at the same time.

lukevOP1y ago

All these efficiencies just increase OpenAI's margin on inference. Of course it's not "one cluster per customer" and of course a customer can't saturate a cluster by themselves, my illustration was only to point out that the economics work.

dartos1y ago

Inference alone totally can be. Just look at banana.dev, runpod, lambda labs, or replicate.

The issue is OpenAI is not just selling inference.

Though I wouldn’t be surprised if there were some hidden costs that are hard for us to account for due to the sheer amount of traffic they must be getting on an hourly basis.

dartos1y ago

Oh actually banana.dev shutdown. Maybe it’s not as profitable.

vrighter1y ago

70 somethings per second, is slow. So that means it does take a very significant amount of resources, considering it's running on the same or better hardware. To sustain 70 things per second for thousands of users, it gets expensive really quickly.

lukevOP1y ago

My point is that at current API pricing the users are paying enough to cover inference costs.

j / k navigate · click thread line to collapse

0 comments

rightbyte1y ago

Running models locally brings my beefy rig to the knees for about half a minute for each querry for smaller models. Answering querries has to be expensive too?

dartos1y ago

The hardware required is the same, just in different amounts.

It’s less (gross) expensive for inference, since it takes less time, but the cost of that time (per second) is the same as training.

lukevOP1y ago

Obviously, that's my point.

We can do the math. GPT-4o can emit about 70 tokens a second. API pricing is $10/million for output tokens and $2.5/million for input tokens.

Let's assume the hardware required to service this load is a rack of 8 H100s (probably not accurate, but likely in the ballpark.). That cost $240k.

So the hardware would pay for itself in 3 years. It probably has a service life of about double that.

So there's no reason to think that inference alone isn't a profitable business.

semanticist1y ago

lukevOP1y ago

Oh they are 100% losing money hand over fist if you include training costs and the eye-watering salaries they pay some of their employees.

I was responding to someone upthread suggesting that they were running even inference at a loss.

neonbjb1y ago

You're missing the fact that requests are batched. It's 70 tokens per second for you, but also for 10s-100s of other paying customers at the same time.

lukevOP1y ago

dartos1y ago

Inference alone totally can be. Just look at banana.dev, runpod, lambda labs, or replicate.

The issue is OpenAI is not just selling inference.

Though I wouldn’t be surprised if there were some hidden costs that are hard for us to account for due to the sheer amount of traffic they must be getting on an hourly basis.

dartos1y ago

Oh actually banana.dev shutdown. Maybe it’s not as profitable.

vrighter1y ago

lukevOP1y ago

My point is that at current API pricing the users are paying enough to cover inference costs.

j / k navigate · click thread line to collapse