Qwen2.5-Max: Exploring the intelligence of large-scale MoE model (opens in new tab)

(qwenlm.github.io)

118 pointsrochoa1y ago30 comments

30 comments

>Many critical details regarding this scaling process were only disclosed with the recent release of DeepSeek V3

And so they decide to not disclose their own training information just after they told everyone how useful it was to get Deepseeks? Honestly can't say I care about "nearly as good as o1" when its a closed API with no additional info.

voxgen1y ago

It's not even "nearly as good as o1". They only compared to the older 4o.

You can safely assume Qwen2.5-Max will score worse than all of the recent reasoning models (o1, DeepSeek-R1, Gemini 2.0 Flash Thinking).

It'll probably become a very strong model if/when they apply RL training for reasoning. However, all the successful recipes for this are closed source, so it may take some time. They could do SFT based on another model's reasoning chains in the meantime, though the DeepSeek-R1 technical report noted that it's not as good as RL training.

kragen1y ago

I thought there were three DeepSeek items on the HN front page, but this turned out to be a fourth one, because it's the Qwen team saying they have a secret version of Qwen that's actually better than DeepSeek-V3.

I don't remember the last time 20% of the HN front page was about the same thing. Then again, nobody remembers the last time a company's market cap fell by 569 billion dollars like NVIDIA did yesterday.

kragen1y ago

Somehow I failed to notice that 4 ÷ 30 is not 20%. It's more like 13%. That was a dumb mistake.

caycep1y ago

it's a scaling law for stocks!

BhavdeepSethi1y ago

HuggingFace demo: https://huggingface.co/spaces/Qwen/Qwen2.5-Max-Demo

Source: https://x.com/Alibaba_Qwen/status/1884263157574820053

ecshafer1y ago

A Chinese company announcing this on Spring Festival eve, that is very surprising. The deep seek announcement must have put a fire under them. I am surprised anything is being done right now in these Chinese tech companies.

rfoo1y ago

Well, DeepSeek engineers are (desperately) fire-fighting as they don't have nearly as much capacity as needed. Competitors either already rushed release or decided to do an hush release of whatever they had in the pipeline. Sounds like everyone is working L

nostradumbasp1y ago

They are being attacked as well.

https://apnews.com/article/deepseek-ai-artificial-intelligen...

lostmsu1y ago

It's like when Gemini topped Chatbot Arena Leaderboard, and OpenAI released a model next day.

lousken1y ago

is gemini really better than e.g. claude 3.5?

2 more replies

simonw1y ago

This appears to be Qwen's new best model, API only for the moment, which they say is better than DeepSeek v3.

rochoaOP1y ago

It is available at https://chat.qwenlm.ai/, under the model selector.

zone4111y ago

I just ran my NYT Connections benchmark on it: 18.6, up from 14.8 for Qwen 2.5 72B. I'll run my other benchmarks later.

https://github.com/lechmazur/nyt-connections/

Havoc1y ago

Kinda ambivalent about MoE in cloud. Where it could really shine though is in desktop class gear. Memory is starting to get fast enough where we might see MoEs being not painfully slow soon for large-ish models.

alecco1y ago

No weights, no proof.

Tiberium1y ago

Would you say the same for OpenAI releasing new models?

halJordan1y ago

Not everything has to be a gotcha moment about Americans

kragen1y ago

I may be misremembering, but I think he has.

mohsen11y ago

This is not the reasoning model. If they beat Deepseek V3 in benchmarks I think a 'reasoning' model would beat o1 Pro

GaggiX1y ago

Now they need to finetune it like R1 and o1 and it will be competitive with SOTA models.

jondwillis1y ago

The significance of _all_ of these releases at once is not lost on me. But the reason for it is lost on me. Is there some convention? Is this political? Business strategy?

logicchains1y ago

Today is the last day before the Chinese New Year.

voxgen1y ago

My thoughts go out to the poor engineers who got put on call because someone scheduled a product release on the day before the biggest holiday of their year.

k__1y ago

Alibaba probably doesn't want DeepSeek to get all the fame.

halJordan1y ago

Sometimes a cigar is just a cigar

bigcat123456781y ago

Party goes on

a_wild_dandan1y ago

> We evaluate Qwen2.5-Max alongside leading models

> [...] we are unable to access the proprietary models such as GPT-4o and Claude-3.5-Sonnet. Therefore, we evaluate Qwen2.5-Max against DeepSeek V3

"We'll compare our proprietary model to other proprietary models. Except when we don't. Then we'll compare to non-proprietary models."

j / k navigate · click thread line to collapse

30 comments

Jackson__1y ago

>Many critical details regarding this scaling process were only disclosed with the recent release of DeepSeek V3

voxgen1y ago

It's not even "nearly as good as o1". They only compared to the older 4o.

You can safely assume Qwen2.5-Max will score worse than all of the recent reasoning models (o1, DeepSeek-R1, Gemini 2.0 Flash Thinking).

kragen1y ago

Somehow I failed to notice that 4 ÷ 30 is not 20%. It's more like 13%. That was a dumb mistake.

caycep1y ago

it's a scaling law for stocks!

BhavdeepSethi1y ago

HuggingFace demo: https://huggingface.co/spaces/Qwen/Qwen2.5-Max-Demo

Source: https://x.com/Alibaba_Qwen/status/1884263157574820053

ecshafer1y ago

rfoo1y ago

nostradumbasp1y ago

They are being attacked as well.

https://apnews.com/article/deepseek-ai-artificial-intelligen...

lostmsu1y ago

It's like when Gemini topped Chatbot Arena Leaderboard, and OpenAI released a model next day.

lousken1y ago

is gemini really better than e.g. claude 3.5?

2 more replies

simonw1y ago

This appears to be Qwen's new best model, API only for the moment, which they say is better than DeepSeek v3.

rochoaOP1y ago

It is available at https://chat.qwenlm.ai/, under the model selector.

zone4111y ago

I just ran my NYT Connections benchmark on it: 18.6, up from 14.8 for Qwen 2.5 72B. I'll run my other benchmarks later.

https://github.com/lechmazur/nyt-connections/

Havoc1y ago

alecco1y ago

No weights, no proof.

Tiberium1y ago

Would you say the same for OpenAI releasing new models?

halJordan1y ago

Not everything has to be a gotcha moment about Americans

kragen1y ago

I may be misremembering, but I think he has.

mohsen11y ago

This is not the reasoning model. If they beat Deepseek V3 in benchmarks I think a 'reasoning' model would beat o1 Pro

GaggiX1y ago

Now they need to finetune it like R1 and o1 and it will be competitive with SOTA models.

jondwillis1y ago

The significance of _all_ of these releases at once is not lost on me. But the reason for it is lost on me. Is there some convention? Is this political? Business strategy?

logicchains1y ago

Today is the last day before the Chinese New Year.

voxgen1y ago

My thoughts go out to the poor engineers who got put on call because someone scheduled a product release on the day before the biggest holiday of their year.

k__1y ago

Alibaba probably doesn't want DeepSeek to get all the fame.

halJordan1y ago

Sometimes a cigar is just a cigar

bigcat123456781y ago

Party goes on

a_wild_dandan1y ago

> We evaluate Qwen2.5-Max alongside leading models

> [...] we are unable to access the proprietary models such as GPT-4o and Claude-3.5-Sonnet. Therefore, we evaluate Qwen2.5-Max against DeepSeek V3

"We'll compare our proprietary model to other proprietary models. Except when we don't. Then we'll compare to non-proprietary models."

j / k navigate · click thread line to collapse