Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
Stevvo
3mo ago
0 comments
Share
The variance is way too high for this test to have any value at all. I ran it 10 times, and each pelican on a bicycle was a better rendition than that, about half of them you could say were perfect.
0 comments
default
newest
oldest
golly_ned
3mo ago
Compared to the other benchmarks which are much more gameable, I trust PelicanBikeEval way more.
2 more replies
getnormality
3mo ago
Well, the variance is itself interesting.
j
/
k
navigate · click thread line to collapse