Obviously, that's my point.
We can do the math. GPT-4o can emit about 70 tokens a second. API pricing is $10/million for output tokens and $2.5/million for input tokens.
Assuming a workload where inputs tokens are 10:1 with output tokens, and that I can generate continuous load (constantly generating tokens). I'll end up paying $210/day in API fees, or $76,650 in a year.
Let's assume the hardware required to service this load is a rack of 8 H100s (probably not accurate, but likely in the ballpark.). That cost $240k.
So the hardware would pay for itself in 3 years. It probably has a service life of about double that.
Of course we have to consider energy too. Each H100 is 700watts, meaning our rack is 5.6 kilowatts, so we're looking at about 49 megawatt-hours to operate for the year. Let's assume they pay wholesale electricity prices of $50/mwh (not unreasonable), and you're looking at a ~$2,500 annual energy bill.
So there's no reason to think that inference alone isn't a profitable business.