undefined | Better HN

0 pointshighfrequency21d ago0 comments

Can you be more specific about which math results you are talking about? Looks like significant improvement on FrontierMath esp for the Pro model (most inference time compute).

0 comments

ZeroCool2u21d ago

Frontier Math, GPQA Diamond, and Browsecomp are the benchmarks I noticed this on.

csnweb21d ago

Are you may be comparing the pro model to the non pro model with thinking? Granted it’s a bit confusing but the pro model is 10 times more expensive and probably much larger as well.

ZeroCool2u21d ago

Ah yes, okay that makes more sense!

j / k navigate · click thread line to collapse

0 pointshighfrequency21d ago0 comments

Can you be more specific about which math results you are talking about? Looks like significant improvement on FrontierMath esp for the Pro model (most inference time compute).

0 comments

ZeroCool2u21d ago

Frontier Math, GPQA Diamond, and Browsecomp are the benchmarks I noticed this on.

csnweb21d ago

Are you may be comparing the pro model to the non pro model with thinking? Granted it’s a bit confusing but the pro model is 10 times more expensive and probably much larger as well.

ZeroCool2u21d ago

Ah yes, okay that makes more sense!

j / k navigate · click thread line to collapse