It’s less (gross) expensive for inference, since it takes less time, but the cost of that time (per second) is the same as training.
We can do the math. GPT-4o can emit about 70 tokens a second. API pricing is $10/million for output tokens and $2.5/million for input tokens.
Assuming a workload where inputs tokens are 10:1 with output tokens, and that I can generate continuous load (constantly generating tokens). I'll end up paying $210/day in API fees, or $76,650 in a year.
Let's assume the hardware required to service this load is a rack of 8 H100s (probably not accurate, but likely in the ballpark.). That cost $240k.
So the hardware would pay for itself in 3 years. It probably has a service life of about double that.
Of course we have to consider energy too. Each H100 is 700watts, meaning our rack is 5.6 kilowatts, so we're looking at about 49 megawatt-hours to operate for the year. Let's assume they pay wholesale electricity prices of $50/mwh (not unreasonable), and you're looking at a ~$2,500 annual energy bill.
So there's no reason to think that inference alone isn't a profitable business.
It's not unusual for a startup to not be profitable, and they're obviously not as the company doesn't make a profit, but I'm not sure why isolating one aspect of their business and declaring it profitable would justify the idea that this company is inevitably a good investment "even if the company went defunct tomorrow".
Perhaps you meant "win" in the sense of "being influential" or something, but I'm pretty sure the people who invested billions of dollars use definitions that involve more concrete returns on their investment.
I was responding to someone upthread suggesting that they were running even inference at a loss.
The issue is OpenAI is not just selling inference.
Though I wouldn’t be surprised if there were some hidden costs that are hard for us to account for due to the sheer amount of traffic they must be getting on an hourly basis.