undefined | Better HN

0 pointsmodeless7mo ago0 comments

The reduction in hallucinations seems like potentially the biggest upgrade. If it reduces hallucinations by 75% or more over o3 and GPT-4o as the graphs claim, it will be a giant step forward. The inability to trust answers given by AI is the biggest single hurdle to clear for many applications.

0 comments

hodgehog117mo ago

Agreed, this is possibly the biggest takeaway to me. If true, it will make a difference in user experience, and benchmarks like these could become the next major target.

j / k navigate · click thread line to collapse

0 pointsmodeless7mo ago0 comments

0 comments

hodgehog117mo ago

Agreed, this is possibly the biggest takeaway to me. If true, it will make a difference in user experience, and benchmarks like these could become the next major target.

j / k navigate · click thread line to collapse