If you run an open source model from the same seed on the same hardware they are completely deterministic. It will spit out the same answer every time. So it’s not an issue with the technology and there’s nothing stopping you from writing repeatable prompts and promoting techniques.
Whenever people talk about "prompt engineering", they're referring to randomly changing these kinds of things, in hopes of getting a query pattern where you get meaningful results 90% of the time.
The reason changing one word in a prompt to a close synonym changes the reply is because it is the specific words used in a series that is how information is embedded and recovered by LLMs. The 'in a series' aspect is subtle and important. The same topic is in the LLM multiple times, with different levels of treatment from casual to academic. Each treatment from casual to formal uses different words, similar words, but different and that difference is very meaningful. That difference is how seriously the information is being handled. The use of one term versus another term causes a prompt to index into one treatment of the subject versus another. The more formal the terms used, meaning the synonyms used by experts of that area of knowledge, generate the more accurate replies. While the close synonyms generate replies from outsiders of that knowledge, those not using the same phrases as those with the most expertise, the phrases used by those perhaps trying to understand but do not yet?
It is not randomly changing things in one's prompts at all. It's understanding the knowledge space one is prompting within such that the prompts generate accurate replies. This requires knowing the knowledge space one prompts within, so one knows the correct formal terms that unlock accurate replies. Plus, knowing that area, one is in a better position to identify hallucination.
Higher level programming languages may make choices for coders regarding lower level functionality, but they have syntactic and semantic rules that produce logically consistent results. Claiming that such rules exist for LLMs but are so subtle that only the ultra-enlightened such as yourself can understand them begs the question: If hardly anyone can grasp such subtlety, then who exactly are all these massive models being built for?
Are you sure of that? Parallel scatter/gather operations may still be at the mercy of scheduling variances, due to some forms of computer math not being associative.
Relying on model, seed, and hardware to get "repeatable" prompts essentially reduces an LLM to a very lossy natural language decompression algorithm. What other reason would someone have for asking the same question over and over and over again with the same input? If that's a problem you need solve then you need a database, not a deterministic LLM.