1MedEvalArena: Peer-judged LLM medical reasoning benchmark (opens in new tab)(danbernardo.substack.com)1docere1mo ago0
2LLM Failure Modes in Medical QA Arising from Inflexible Reasoning (opens in new tab)(arxiv.org)3docere1y ago0