undefined | Better HN

0 pointsoceanplexian8mo ago0 comments

> LLMs are not very predictable. And that's not just true for the output.

If you run an open source model from the same seed on the same hardware they are completely deterministic. It will spit out the same answer every time. So it’s not an issue with the technology and there’s nothing stopping you from writing repeatable prompts and promoting techniques.

0 comments

o11c8mo ago

By "unpredictability", we mean that AIs will return completely different results if a single word is changed to a close synonym, or an adverb or prepositional phrase is moved to a semantically identical location, etc. Very often this simple change will move you from "get the correct answer 90% of the time" (about the best that AIs can do) to "get the correct answer <10% of the time".

Whenever people talk about "prompt engineering", they're referring to randomly changing these kinds of things, in hopes of getting a query pattern where you get meaningful results 90% of the time.

bsenftner8mo ago

What you're describing is specifically the subtle nature of LLMs I'm pointing at; that changing of a single word to a close synonym is meaningful. Why and how they are meaningful gets pushback from the developer community, they somehow do not see this as being a topic, a point of engineering proficiency. It is, but requires an understanding of how LLMs encode and retrieve data.

The reason changing one word in a prompt to a close synonym changes the reply is because it is the specific words used in a series that is how information is embedded and recovered by LLMs. The 'in a series' aspect is subtle and important. The same topic is in the LLM multiple times, with different levels of treatment from casual to academic. Each treatment from casual to formal uses different words, similar words, but different and that difference is very meaningful. That difference is how seriously the information is being handled. The use of one term versus another term causes a prompt to index into one treatment of the subject versus another. The more formal the terms used, meaning the synonyms used by experts of that area of knowledge, generate the more accurate replies. While the close synonyms generate replies from outsiders of that knowledge, those not using the same phrases as those with the most expertise, the phrases used by those perhaps trying to understand but do not yet?

It is not randomly changing things in one's prompts at all. It's understanding the knowledge space one is prompting within such that the prompts generate accurate replies. This requires knowing the knowledge space one prompts within, so one knows the correct formal terms that unlock accurate replies. Plus, knowing that area, one is in a better position to identify hallucination.

noduerme8mo ago

What you are describing is not natural language programming, it's the use of incantations discovered by accident or by trial and error. It's alchemy, not chemistry. That's what people mean when they say it's not reproducible. It's not reproducible according to any useful logical framework that could be generally applied to other cases. There may be some "power" in knowing magical incantations, but mostly it's going to be a series of parlor tricks, since neither you nor anyone else can explain why one prompt produces an algorithm that spits out value X whilst changing a single word to its synonym produces X*-1, or Q, or 14 rabbits. And if you could, why not just type the algorithm yourself?

Higher level programming languages may make choices for coders regarding lower level functionality, but they have syntactic and semantic rules that produce logically consistent results. Claiming that such rules exist for LLMs but are so subtle that only the ultra-enlightened such as yourself can understand them begs the question: If hardly anyone can grasp such subtlety, then who exactly are all these massive models being built for?

1 more reply

handfuloflight8mo ago

Words are power, and specifically, specific words are power.

1 more reply

CoastalCoder8mo ago

> If you run an open source model from the same seed on the same hardware they are completely deterministic.

Are you sure of that? Parallel scatter/gather operations may still be at the mercy of scheduling variances, due to some forms of computer math not being associative.

atemerev8mo ago

Sure. Just set the temperature to 0 in every model and see it become deterministic. Or use a fully deterministic PRNG like random123.

dimitri-vs8mo ago

Realistically, how many people do you think have the time, skills and hardware required to do this?

enragedcacti8mo ago

Predictable does not necessarily follow from deterministic. Hash algorithms, for instance, are valuable specifically because they are both deterministic and unpredictable.

Relying on model, seed, and hardware to get "repeatable" prompts essentially reduces an LLM to a very lossy natural language decompression algorithm. What other reason would someone have for asking the same question over and over and over again with the same input? If that's a problem you need solve then you need a database, not a deterministic LLM.

mafuy8mo ago

Who's saying that the model stays the same and the seed is not random for most of the companies that run AI? There is no drawback to randomness for them.

j / k navigate · click thread line to collapse

0 comments

o11c8mo ago

Whenever people talk about "prompt engineering", they're referring to randomly changing these kinds of things, in hopes of getting a query pattern where you get meaningful results 90% of the time.

bsenftner8mo ago

noduerme8mo ago

1 more reply

handfuloflight8mo ago

Words are power, and specifically, specific words are power.

1 more reply

CoastalCoder8mo ago

> If you run an open source model from the same seed on the same hardware they are completely deterministic.

Are you sure of that? Parallel scatter/gather operations may still be at the mercy of scheduling variances, due to some forms of computer math not being associative.

atemerev8mo ago

Sure. Just set the temperature to 0 in every model and see it become deterministic. Or use a fully deterministic PRNG like random123.

dimitri-vs8mo ago

Realistically, how many people do you think have the time, skills and hardware required to do this?

enragedcacti8mo ago

Predictable does not necessarily follow from deterministic. Hash algorithms, for instance, are valuable specifically because they are both deterministic and unpredictable.

mafuy8mo ago

Who's saying that the model stays the same and the seed is not random for most of the companies that run AI? There is no drawback to randomness for them.

j / k navigate · click thread line to collapse