Tripling an LLM's ARC-AGI-2 score with code evolution (opens in new tab)

(imbue.com)

17 pointsdanielmewes2mo ago2 comments

2 comments

Author here.

We originally developed our evolution-inspired tool to optimize LLM prompts. To our surprise, we found that the same method also worked well for getting better performance out of a base model on ARC-AGI tasks.

We're open-sourcing the evolver tool today. It is built to be adapted to many different optimization problems. (Some coding required) You can read more about it at https://imbue.com/research/2026-02-27-darwinian-evolver/

Happy to answer questions!

mrtibbets2mo ago

Clean breakdown. The reasoning vs scaling framing makes sense.

j / k navigate · click thread line to collapse

2 comments

danielmewesOP2mo ago

Author here.

Happy to answer questions!

mrtibbets2mo ago

Clean breakdown. The reasoning vs scaling framing makes sense.

j / k navigate · click thread line to collapse