They are literally Markov chains according to the mathematical definition. The code is complicated. Having complicated code doesn't mean it's not literally a Markov chain.
> I implemented Markov chains in BASIC in about ten lines of code in the 1980s on a 1 Mhz 64K Apple II after reading about the famous Mark V. Shaney hoax (https://en.wikipedia.org/wiki/Mark_V._Shaney). No neural nets or fancy GPUs required.
I don't doubt this. You can make a Markov chain by just counting the frequency of letters that follow each letter giving one that has a context window of one or two characters. That is a very simple Markov chain. You can make it by hand. You can make ones with more context window like a dozen characters or a few words, using sophisticated smoothing and regularization methods and not just frequency counts. Those are also simple Markov chains that you can do without neural net or GPU. Then you can also make a Markov chain that has a context window of thousands of tokens that is made from neural nets and massive training data and differentiable tensor computing libraries with data centers full of hardware linear algebra accelerators. Those are some even bigger Markov chains!
> LLMs are way more complicated than simple Markov chains.
That's true, they are more complicated than simple Markov chains, if by simple Markov chains you mean ones with small context window. LLMs are Markov chains with large context window!