undefined | Better HN

0 pointsMike_123453y ago0 comments

Then by that reasoning everything in the physical world is a Markov chain, right? That is like saying that any deterministic process in time is a Markov chain.

A tennis ball in flight is a Markov chain since the state at t is a function of the state at t-1.

You have missed the point about the Attention Mechanism in GPT. That is not a Markov chain by definition.

0 comments

golol3y ago

>Then by that reasoning everything in the physical world is a Markov chain, right?

Well I guess maybe it's true that you can turn any stochastic process into a Markov Chain by changing the state space somehow (for example the states could be sample trajectories up to some finite time T). And while this is true it may be not very insightful.

But I personally think that to understand LLMs it is much better to think of the whole context window as a state rather than the individual states. If you modelled a simple register-instruction computer as a stochatic process, would you take the states to be (address last symbol written, last symbol written)? It makes much more sense to take the whole memory as a state. Similarly a transformer operates on its memory, the context window, so that should be seen as the state. This makes it clear that seeing it as just a stochastic parrot is misleading, as its all about conditioning the distribution of the next token via prompt engineering the previous tokens. And it is nevertheless a Markov chain with this state space.

Mike_12345OP3y ago

So basically you're saying it's just an algorithm running on a computer? Yes I agree with that.

golol3y ago

It is an algorithm running a computer. The software is whatever you prompt engineered. That is the algorithm running on the computer.

You know, I think that some people (I see on twitter, probably not you) have a wrong intuition about artificial intelligence. They see models which are fundamentally stochastic as incapable of really ever being truly intelligent. It's "just statistics" or just a "stochastic parrot" and it just learns probabilities instead of real meaning. Perhaps they think that since there is always randomness involved, you can not have the kind of deterministic thought process that we feel we have. The worst offender is then considered to be the old school Markov chain.

I obviously think this is wrong and that's why I like to emphasize that transformers are best interpreted as Markov Chains on a larger state space, and this does actually explain their computational behavior.

1 more reply

j / k navigate · click thread line to collapse

0 comments

golol3y ago

>Then by that reasoning everything in the physical world is a Markov chain, right?

Mike_12345OP3y ago

So basically you're saying it's just an algorithm running on a computer? Yes I agree with that.

golol3y ago

It is an algorithm running a computer. The software is whatever you prompt engineered. That is the algorithm running on the computer.

1 more reply

j / k navigate · click thread line to collapse