undefined | Better HN

0 pointsmlyle2y ago0 comments

I'm not sure for text it's a better performing model. I was just testing GPT-4o on a use case (generating AP MCQ questions) and -4o is repeatedly generating questions with multiple correct answers and will not fix it when prompted.

(Providing the history to GPT-4Turbo results in it fixing the MCQ just fine).

0 comments

muzani2y ago

After some testing, I find it's not as good as code too. It is better at some things, but benchmarks don't tell the whole story apparently.

llm_trw2y ago

Yes, I've been using phind extensively as a google replacement and after moving to gpt4o the responses have gotten so much dumber.

I guess time to build something that lets you select which model to use after a google search.

j / k navigate · click thread line to collapse

0 comments

muzani2y ago

After some testing, I find it's not as good as code too. It is better at some things, but benchmarks don't tell the whole story apparently.

llm_trw2y ago

Yes, I've been using phind extensively as a google replacement and after moving to gpt4o the responses have gotten so much dumber.

I guess time to build something that lets you select which model to use after a google search.

j / k navigate · click thread line to collapse