I also don't thing the only way to improve LLM is by improving as zero shot inference. Did wrote any code in zero shot style that compiled and worked? It's a multistep process and probably agents and planning will be a next step for LLM.
Cheap inference help a lot in this case since you can give a task during the night to AI what you wanna do. Go to sleep then in the morning review the results. In this way AI is bruteforcing the solution by trying many different paths but that's kind of e.g. most programming works. You try many things until you don't have errors, code compiles and passes the tests.