Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
My benchmark for large language models | Better HN
My benchmark for large language models
(opens in new tab)
(nicholas.carlini.com)
4 points
cheviethai123
2y ago
2 comments
Share
2 comments
default
newest
oldest
cheviethai123
OP
2y ago
Consider how low the score of Gemini here compared to the other LLM test. And I'm impressed by the evaluation method's ability to assess performance without relying on tailored prompts.
hoamatcuoi
2y ago
But the benchmark only scoring Gemini-Pro 1, I'm curious how the Gemini Ultra performance here but guessed we couldn't know yet.
j
/
k
navigate · click thread line to collapse