I've tried using a few new languages and the LLMs would all swap the code for syntactically similar languages, even after telling them to read the doc pages.
Whether that's for better or worse I don't know, but it does feel like new languages are genuinely solving hard problems as their raison d'etre.
LLMs thrive because they had a wealth of high-quality corpus in the form os Stack Overflow, Github, etc. and ironically their uptake is causing a strangulation of that source of training data.
Were they to train it on their C++ codebase, it would not be effective on account of the fact that they don't use boost or cmake or any major stuff that C++ in the wider world use. It would also suggest that the user make use of all kinds of non-available C++ libraries. So no, they are not training on their own C++ corpus nor would it be particularly useful.
But does Google actually train its models on its internal codebase? Considering that there’s always the risk of the models leaking proprietary information and security architecture details, I hardly believe they would run that risk.
We have a second, isolated model that has trained on internal code. The public Gemini AFAIK has never seen that content. The lawyers would explode.
Just out of curiosity, do you see much difference in quality between the isolated model and the public-facing ones?
Thinking about it - was this not the idea of go from the start? Nothing fancy to keep non-rocket scientist away from foot-guns, and have everyone produce code that everyone else can understand.
Diving in to a go project you almost always know what to expect, which is a great thing for a business.
I had always designed very large projects as few medium sized independent Go tools and that strategy pays in times of AI assisted coding.