Code runs or it doesn't... but that doesn't mean it does what you want it to do.
An LLM could generate code that takes raw user input and adds it to a raw SQL query. Does it work? Yeah. Is it a terrible security flaw? Also yeah.
Additionally, if you want a certain UX and the LLM cannot get there but the code works, that doesn't mean it's successful.