undefined | Better HN

0 pointsjychang17d ago0 comments

No, Opus cannot be 10x larger than the chinese models.

If Opus was 10x larger than the chinese models, then Google Vertex/Amazon Bedrock would serve it 10x slower than Deepseek/Kimi/etc.

That's not the case. They're in the same order of magnitude of speed.

0 comments

Filligree17d ago

They serve it about 2x slower. So it must have about 2x the active parameters.

It could still be 10x larger overall, though that would not make it 10x more expensive.

jychangOP15d ago

Yes, but I highly doubt they would increase sparsity much vs the chinese models.

That's how you get Llama 4.

Pretty much every major lab settled on ~3-5% sparsity for a reason.

bakugo17d ago

I agree that Opus almost definitely isn't anywhere near that big, but AWS throughput might not be a great way to measure model size.

According to OpenRouter, AWS serves the latest Opus and Sonnet at roughly the same speed. It's likely that they simply allocate hardware differently per model.

jychangOP15d ago

The numbers look about right. Opus 4.5 is about 1.5x the size of Sonnet 4.6, and Opus 4/4.1 is about 5x the size of Sonnet 4.5/4.6.

Note that Opus 4.5 is about 1/3 the size of Opus 4/4.1 (and 1/3 the price in the API)

torginus16d ago

My understanding is that for MoE with top K architecture, model size doesn't really matter, as you can have 10 32GB experts or a thousand, if only 2-3 of them are active at the same time, your inference workload will be identical, only your hard drive traffic will incread.

Which seems to be the case, seeing how hungry the industry lately has been for hard drives.

j / k navigate · click thread line to collapse

0 comments

Filligree17d ago

They serve it about 2x slower. So it must have about 2x the active parameters.

It could still be 10x larger overall, though that would not make it 10x more expensive.

jychangOP15d ago

Yes, but I highly doubt they would increase sparsity much vs the chinese models.

That's how you get Llama 4.

Pretty much every major lab settled on ~3-5% sparsity for a reason.

bakugo17d ago

I agree that Opus almost definitely isn't anywhere near that big, but AWS throughput might not be a great way to measure model size.

According to OpenRouter, AWS serves the latest Opus and Sonnet at roughly the same speed. It's likely that they simply allocate hardware differently per model.

jychangOP15d ago

The numbers look about right. Opus 4.5 is about 1.5x the size of Sonnet 4.6, and Opus 4/4.1 is about 5x the size of Sonnet 4.5/4.6.

Note that Opus 4.5 is about 1/3 the size of Opus 4/4.1 (and 1/3 the price in the API)

torginus16d ago

Which seems to be the case, seeing how hungry the industry lately has been for hard drives.

j / k navigate · click thread line to collapse