And to recap their statement about it being second most powerful, it's based on MMLU scores, which IMO is a non-useful comparison. (Also, doesn't test against GPT-4-Turbo or Claude-long-2.1)
What they're saying is that Inflection-2 ranks #2 relative to other models including GPT-4, Claude-2, PaLM 2, Grok-1, and Llama 2 70b, specifically on MMLU scores.
This model could be great, but that'll be determined by "do day to day users, both free and paying, prefer it over Claude 2 and GPT-4-Turbo" - not MMLU scores.
[1]: https://huggingface.co/datasets/lukaemon/mmlu/viewer/abstrac...
No comments yet.