undefined | Better HN

0 pointswokwokwok2y ago0 comments

That's not a metric.

That's a use case.

Certainly, no one here is arguing that there are things openai refuses to allow, and given that the effectiveness of using GPT4 on them is literally zero, a sweet potato connected to a spring and keyboard will "beat" GPT-4, if that's your scoring metric.

If you want a meaningful comparison you need tasks that both tools are capable of doing, and then see how effective they are.

Claiming that mistral medium beats it is like me claiming the RenderMan beats DALLE2 at rendering 3d models; yes, technically they both generate images, but since it's not possible to use DALLE2 to render a 3d model, it's not really a meaningful comparison is it?

0 comments

theshackleford2y ago

> If you want a meaningful comparison you need tasks that both tools are capable of doing, and then see how effective they are.

The fact it’s incapable of simple requests that an alternative can is absolutely part of a worthwhile comparison.

wokwokwokOP2y ago

You’re just twisting what “best” means to suit your bias.

That is not a measure of how sophisticated and capable a model is.

GPT4 is a more sophisticated, more capable mode than mistral.

If that doesn’t make it the “better” for you, that’s fine; but any attempt to argue about the capabilities of the models is misguided.

Restrictions placed on a model are an orthogonal concern to its capabilities.

…but sure, you can invent some benchmarks to score models on other criteria, which is entirely valid.

It’s perfectly fair to say that GPT4 doesn’t top all possible metrics… only the meaningful ones about model capabilities.

bambax2y ago

Semantics.

Both tools are generative systems that produce text in response to a prompt. If Mistral was mute on random topics for no other reason that its makers dislike talking about that, would you say it doesn't count?

j / k navigate · click thread line to collapse