I think simply having the vocab more code friendly (e.g. codex) would make the biggest difference, whitespace is the biggest one (afaik every space is a token), but consider how many languages continue `for(int i=0;`, `) {\n`, `} else {`, 'import ', etc.
My understanding is that a model properly trained on multiple languages will beat an expert based system. I feel like programming languages overlap, and interop with each other enough that I wouldn't want to specialize it in just one language.