I do not think this will scale. GPT o1 is presumably good for bootstrapping a project using tools that the engineer is not familiar with. The model will struggle to update a sizable codebase, however, with dependencies between the files.
Secondly, no matter the size of the codebase and no matter the model used, the engineer still has to review every single line before incorporating it into the project. Only a competent engineer can review code effectively.