1ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases (opens in new tab)(arxiv.org)2BalinKing15h ago0