undefined | Better HN

0 pointsftxbro2y ago0 comments

This is the whole point of the breakthrough related to the emergence of cognitive capabilities of LLMs. They are literally Markov chains. No one expected it to happen to this degree, but here we are.

0 comments

jhbadger2y ago

People say that "they are literally Markov chains", but anyone who has looked at the code for LLMs knows that they are more complicated than that. I implemented Markov chains in BASIC in about ten lines of code in the 1980s on a 1 Mhz 64K Apple II after reading about the famous Mark V. Shaney hoax (https://en.wikipedia.org/wiki/Mark_V._Shaney). No neural nets or fancy GPUs required. It's one thing to stress that LLMs aren't magical or self-aware, but the fact is they are way more complicated than simple Markov chains.

ftxbroOP2y ago

> People say that "they are literally Markov chains", but anyone who has looked at the code for LLMs knows that they are more complicated than that.

They are literally Markov chains according to the mathematical definition. The code is complicated. Having complicated code doesn't mean it's not literally a Markov chain.

> I implemented Markov chains in BASIC in about ten lines of code in the 1980s on a 1 Mhz 64K Apple II after reading about the famous Mark V. Shaney hoax (https://en.wikipedia.org/wiki/Mark_V._Shaney). No neural nets or fancy GPUs required.

I don't doubt this. You can make a Markov chain by just counting the frequency of letters that follow each letter giving one that has a context window of one or two characters. That is a very simple Markov chain. You can make it by hand. You can make ones with more context window like a dozen characters or a few words, using sophisticated smoothing and regularization methods and not just frequency counts. Those are also simple Markov chains that you can do without neural net or GPU. Then you can also make a Markov chain that has a context window of thousands of tokens that is made from neural nets and massive training data and differentiable tensor computing libraries with data centers full of hardware linear algebra accelerators. Those are some even bigger Markov chains!

> LLMs are way more complicated than simple Markov chains.

That's true, they are more complicated than simple Markov chains, if by simple Markov chains you mean ones with small context window. LLMs are Markov chains with large context window!

cubacaban2y ago

How big is the state space of the Markov chain corresponding to a LLM generating a sequence of tokens? Wouldn't it be (size of the vocabulary)^(size of the context window), i.e. ~ (100k)^(4k)? How useful is it to conceptualize LLMs as Markov chains at that point? For example, is there a result about Markov chains with interesting implications for LLMs?

1 more reply

dclowd99012y ago

Almost kind of proves ideas shouldn’t be copyrightable.

bramblerose2y ago

Ideas aren't copyrightable.

moffkalast2y ago

Disney: "Let's agree to disagree."

ftxbroOP2y ago

maybe they meant idea like when you write a book you are transcribing a series of ideas you had

2 more replies

j / k navigate · click thread line to collapse

0 comments

jhbadger2y ago

ftxbroOP2y ago

> People say that "they are literally Markov chains", but anyone who has looked at the code for LLMs knows that they are more complicated than that.

They are literally Markov chains according to the mathematical definition. The code is complicated. Having complicated code doesn't mean it's not literally a Markov chain.

> LLMs are way more complicated than simple Markov chains.

That's true, they are more complicated than simple Markov chains, if by simple Markov chains you mean ones with small context window. LLMs are Markov chains with large context window!

cubacaban2y ago

1 more reply

dclowd99012y ago

Almost kind of proves ideas shouldn’t be copyrightable.

bramblerose2y ago

Ideas aren't copyrightable.

moffkalast2y ago

Disney: "Let's agree to disagree."

ftxbroOP2y ago

maybe they meant idea like when you write a book you are transcribing a series of ideas you had

2 more replies

j / k navigate · click thread line to collapse