1Show HN: FKS2G – LLM-backed metrics for deciding how closely to review code (opens in new tab)(github.com)2kmdupree5d ago0
2Anthropic's Argument for Mythos SWE-bench improvement contains a fatal error (opens in new tab)(philosophicalhacker.com)4kmdupree28d ago0
3Anthropic's Argument for Mythos SWE-bench improvement contains a fatal error (opens in new tab)(philosophicalhacker.com)3kmdupree1mo ago0
4SWE-bench Verified no longer measures frontier coding capabilities (opens in new tab)(openai.com)343kmdupree1mo ago181
6Thoughts about Moments in Claude Mythos System Card (opens in new tab)(old.reddit.com)3kmdupree1mo ago0
7EsoBench: Learning a Novel Esolang via Iterative Execution Feedback (opens in new tab)(caseys-evals.com)1kmdupree1mo ago0
10Scientists just developed a new AI modeled on the human brain (opens in new tab)(livescience.com)4kmdupree9mo ago0
13Atlassian migrated 4M Postgres databases to shrink AWS bill (opens in new tab)(theregister.com)8kmdupree10mo ago0
14Libraries are under-used. LLMs make this problem worse (opens in new tab)(makefizz.buzz)62kmdupree11mo ago52