Google / Deepmind has already done this last year. They published the paper at the end of 2022 I think. (searches) yep:
https://www.deepmind.com/blog/competitive-programming-with-a...It relies on test availability of good test cases to verify the code functionality: they generate many, many possible programs, throw out all the ones that aren't syntactically correct, compile the ones that are & then test them. This works reasonably well for programming competitions & the code generated will probably improve over time.
The issue for "real-world" problems is that you can't verify the code without the test cases & writing good tests that ensure the code does what you want can be as much work as writing the code itself in the first place - the tests form a kind of mathematical co-domain to the code after all & only a small subset of problems are so simply defined that you can pin them down with just one or two tests.