1Did Google's AI agents build an operating system for $916? (opens in new tab)(normaltech.ai)4randomwalker2d ago0
2Open-world evaluations for measuring frontier AI capabilities [pdf] (opens in new tab)(cruxevals.com)2randomwalker1mo ago0
4When AI Builds AI – Findings from a Workshop on Automation of AI R&D [pdf] (opens in new tab)(cset.georgetown.edu)1randomwalker3mo ago0
5The Longitudinal Expert AI Panel: Understanding Expert Views on AI [pdf] (opens in new tab)(static1.squarespace.com)1randomwalker6mo ago0
6Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation (opens in new tab)(arxiv.org)1randomwalker7mo ago0
8Could AI slow science? Confronting the production-progress paradox (opens in new tab)(aisnakeoil.com)2randomwalker10mo ago0
10Why an overreliance on AI-driven modelling is bad for science (opens in new tab)(nature.com)1randomwalker1y ago0
12We Looked at 78 Election Deepfakes. Political Misinformation Isn't an AI Problem (opens in new tab)(knightcolumbia.org)5randomwalker1y ago0
13Inference Scaling FLaws: The Limits of LLM Resampling with Imperfect Verifiers (opens in new tab)(arxiv.org)3randomwalker1y ago0
14Is the UK's liver transplant matching algorithm biased against younger patients? (opens in new tab)(aisnakeoil.com)93randomwalker1y ago62
15Core-Bench: Computational Reproducibility Agent Benchmark (opens in new tab)(arxiv.org)1randomwalker1y ago0