My understanding is that a model properly trained on multiple languages will beat an expert based system. I feel like programming languages overlap, and interop with each other enough that I wouldn't want to specialize it in just one language.
The vocab size of llama2 is 32,000. I guess I personally don't think that there's enough difference in programming languages to actually save any meaningful number of tokens considering the magnitude of the current vocab.
https://huggingface.co/mlc-ai/mlc-chat-Llama-2-7b-chat-hf-q4...
it looks like if you just limit it to English it'd cut the count almost by half - further limiting the vocab to a specific programming language could cut it down even more. Pure armchair theory-crafting on my part, no idea if limiting vocab is even a reasonable way to improve context handling. But it's an interesting idea - build on a base then specialize as needed and let the user swap out the LLM on an as-needed bases (or the front-end tool could simply detect the language of the project). 3B or smaller models with very long context which excel at one specific thing could be really useful (e.g. local code completer for English typescript projects)