I'm sorry, but what observations support that hypothesis? There were scores of teams trying exactly that - training LLMs
directly on Arc-AGI data - and by and large they achieved mediocre results. It just isn't an approach that works for this problem set.
To be honest your argument sounds like an attempt to motivate a predetermined conclusion.