However, if you are willing to stub your toes, retry, and pay more money, an entire new world opens up. Languages like python seem to fall apart faster in extremely large projects.
I've got a collection of interdependent .NET codebases with about 50 megs of raw source between them. Having C# be strongly typed seems like an essential backbone for keeping everything on rails in my agentic scenarios. The code edits have been flawless for several months now. I've got successful apply_patch usages that touch 20 files at a time. LLM code editing performance might be mostly language agnostic once we compensate for the strictness of the type system. More specifically, how much useful information is returned at compile time.
Compile time errors and warnings are probably the most powerful alignment mechanism available. Some ecosystems allow for you to specify your own classes of errors and warnings. I think tools like Roslyn Analyzers might be more powerful than unit tests in this application. Domain-specific compilation feedback feels like the holy grail to me.
https://learn.microsoft.com/en-us/visualstudio/code-quality/...