undefined | Better HN

0 pointsstartages1mo ago0 comments

This is misleading. I'm running a live experiment here: https://project80.divcrafts.com/

There are 4 models, all receiving the exact same prompts a few times a day, required to respond with a specific action.

In the first experiment I used gemini-3-pro-preview, it spent ~$18 on the same task where Opus 4.5 spent ~$4, GPT-5.1 spent ~$4.50, and Grok spent ~$7. Pro was burning through money so fast I switched to gemini-3-flash-preview, and it's still outspending every other model on identical prompts. The new experiment is showing the same pattern.

Most of the cost appears to be reasoning tokens.

The takeaway here is: Gemini spends significantly more on reasoning tokens to produce lower quality answers, while Opus thinks less and delivers better results. The per-token price being lower doesn't matter much when the model needs 4x the tokens to get there.

0 comments

camel_Snake1mo ago

Is that no longer the case, or am I misunderstanding the operational costs displayed?

Opus: 521k input tokens; 12k out

Grok: 443k input tokens; 57k out

Gemini: 677k input tokens; 7k out

OAI: 543k input tokens; 17k out

Gemini appears to use by far the least amount of reasoning tokens, assuming they're included in the output counts.

j / k navigate · click thread line to collapse