So your assumption is that it will ultimately be the users of software themselves who will throw some every day language at an AI and it will reliably generate something that meets those users' intuitive expectations?
Yes, it will be at least as reliable as an average software engineer at an average company (probably more reliable than that), or at least as reliable as a self-driving car where a user says get me to this address, and the car does it better (statistically) than an average human driver.