This is what you get for relying on the generosity of billionaires. Keep offshoring your thinking ability to a machine and let me know how competitive you. Hint, you wont be. There's nothing special about being able to use an LLM.
But even when it happens I doubt it would be as cheap as it is right now. Enjoy it while it lasts!
Please go run some numbers.The hardware needed to Run Deepseek v4 flash at 20 tps for a single session is nowhere close to what is required to run it at 50tps for 5,000 concurrent sessions.
Imagine what it takes to be profitible when running at 150 tps for 30cents per 1mm. You make less than 1k per month and the hardware required to run that cost 10k a month to rent with hardly any concurrent session capability.
- DeepSeek serves DeepSeek V4 Pro at 27 tps: https://openrouter.ai/deepseek/deepseek-v4-pro
- At 27 tps per user, a B300 GPUS will give you around 800 tokens per second (serving 30 users): https://developer-blogs.nvidia.com/wp-content/uploads/2026/0...
- That's 800 * 60 * 60 generated tokens per hour, at a cost of $0.87 per 1M tokens, or $2.50 per hour.
- For input and output tokens, the math is a bit more complicated because we have to make assumptions about their ratio. Using the published values from OpenCode, we get another $2.50 for cached tokens (which are almost free for DeepSeek) and another $3.40 for input tokens (which are a lot cheaper to compute than output tokens), which gives us a total of $8.50 per hour per B300 GPU.
- B300 GPUs can be rented for as low as $3.40 per hour, which is less than $8.50, so hosting DeepSeek V4 Pro is profitable.
You could also host it at fewer tps per user to raise the efficiency and therefore the profit even higher.
There’s a pretty significant difference between saying someone tripled their prices, and a temporary promotion ended. It’s even more so the case if someone is using it as an example for raising prices as a trend.
I’m 100% in the camp that prices are going up and quality is going down; companies are retiring models and requiring you to use more expensive ones. This has happened to me and there are dozens of examples that one can point to.
But a promotion ending is a strawman argument and does the point a disservice.
Yes. Did they double their msrp? no. They did double their effective price relative to me which is all that matters unless you're doing economic math or something.
Smh, it's all downhill from the first unadulterated neuron.
Everyone could see this coming from miles away, everyone warned that this would happen again and again and again, and it always got dismissed.
I think it is priced high because it's basically their smartest model as well as their fastest, so why shouldn't they?
You can still use earlier generations of Flash at a lower cost if you want "fast and cheap and just OK," which often makes sense. (Just checked)
I would predict they will lower this price when 3.5 High appears, but perhaps not all the way.