https://www.dartmouth.edu/~chance/teaching_aids/books_articl...
Or starting from the basics, and learning how to actually do the number crunching, this is unusually good (Stewart, Introduction to numerical solution of Markov Chains):
https://press.princeton.edu/titles/5640.html
Robert Gallager's MIT lecture series, very well presented, titled Principles of Digital Communications, takes you on another train based on Markov Chains (Kalman filters, etc).
https://ocw.mit.edu/courses/electrical-engineering-and-compu...
Markov Chain Monte Carlo to sample from probability distributions is a good start - https://arxiv.org/abs/1206.1901 if you are into sampling.
45% http://setosa.io/ev/markov-chains/
30% https://en.wikipedia.org/wiki/Markov_chain
25% Youtube
Covers a ton of ground, and gives concrete examples to motivate the ideas.
2. Poisson processes.
3. The Markov property.
4. Stochastic processes.
5. Realise that you’re missing a background in analysis, therefore you don’t know sh?t about measure theory but you actually need it to know anything deeper . Wonder to yourself if you really want to spend the next 3 years getting a maths background you don’t have.
6. Convince yourself that it’s all just engineering and middle through by picking a project involving non trivial markov chain.
7. Go back and spend 3 years doing foundational maths then repeat point 1-5.
In discrete time and discrete space, it mostly just reduces to linear algebra.
However, if you actually want a background in the theory of Markov chains, I don’t think this approach works.
Most interesting facts about Markov chains (e.g. the Stationary Distribution Theorem) really are probabilistic generalisations of simpler facts about FSAs (e.g. FSAs cannot be used to "count"). In my experience, understanding them first for FSAs and then seeing how they generalise for the probabilitic case is a good way of approaching this subject.
[1] http://camdavidsonpilon.github.io/Probabilistic-Programming-...
it can be implemented in a few lines of code, that's the beauty of it: https://github.com/justindomingue/markov_chains/blob/master/...
obviously then you could take the previous n words into account, tweak the starting word, add randomness, etc.
now replace "word" with "state" and "probability(next state | previous state)" to edges of a graph: https://static1.squarespace.com/static/54e50c15e4b058fc6806d...
and you got a generic markov chain :)
footnotes: p(A | B) is probability of A given B, e.g. p(rain | clouds) > p(rain | sun) :)
The Charniak book is primarily about HMMs and quite short, so it's the best introduction to the subject. Manning and Schütze and Jurafsky and Martin are much more extensive and cover pretty much all of statistical NLP up to their publication date (so no LSTMs if I remember correctly) but they are required reading for an in-depth approach.
You will definitely want to go beyond HMMs at some point, so you will probably want the other two books. But, if you really just want to know about HMMs, then start with the Charniak.
______________
[1] https://mitpress.mit.edu/books/statistical-language-learning
If you like it I recommend watching the whole series.
If you have time to read step-by-step derivations and want to understand the fundamentals, I think this is an excellent self-contained resource.
Rigorous maths is akin to trying to explain to your non technical friends what you do in devops: colloquialise it all you want, it’ll always be a shallow story.
Technically Linear Algebra is not "required" to understand Markov Chains, but it's a very neat way to think about them: each "step" in the chain is equivalent to multiplication of the state vector by the transition matrix.
https://github.com/melling/MathAndScienceNotes/tree/master/s...
https://soundcloud.com/jamesbradbury/9th-order-markov-chain-...
Some favourites:
- Erowid trip reports and tech recruiter emails - https://twitter.com/erowidrecruiter
- Calvin and Markov - Calvin and Hobbes strips reimagined http://joshmillard.com/markov/calvin/
- Generate your future tweets based on the DNA of your existing messages - http://yes.thatcan.be/my/next/tweet/
- Fake headlines created by smashing up real headlines - https://www.headlinesmasher.com/best/all
- The most confusing subreddit (often on the front page) - https://www.reddit.com/r/subredditsimulator
The original Markov-generated content prank: "I Spent an Interesting Evening Recently with a Grain of Salt" https://web.archive.org/web/20011101013348/http://www.sincit...
And of course (un-amusingly!) - Google's PageRank algorithm is built on Markov Chains https://en.wikipedia.org/wiki/PageRank#Damping_factor
n.b. there used to be parodies of Hacker News, but both are down: https://news.ycombniator.com/ and https://lou.wtf/phaker-news
[1] https://www.classcentral.com/course/edx-queuing-theory-from-...
https://www.youtube.com/playlist?list=PL7-jPKtc4r78-wCZcQn5I...
there is a lot of info out there about markov chains, but very little about markov decision processes (MDP).
How popular are MDP? What are their strengths? weaknesses?
-- Kalman Filters vs HMM (Hidden Markov Model):
"In both models, there's an unobserved state that changes over time according to relatively simple rules, and you get indirect information about that state every so often. In Kalman filters, you assume the unobserved state is Gaussian-ish and it moves continuously according to linear-ish dynamics (depending on which flavor of Kalman filter is being used). In HMMs, you assume the hidden state is one of a few classes, and the movement among these states uses a discrete Markov chain. In my experience, the algorithms are often pretty different for these two cases, but the underlying idea is very similar." - THISISDAVE
-- HMM vs LSTM/RNN:
"Some state-of-the-art industrial speech recognition [0] is transitioning from HMM-DNN systems to "CTC" (connectionist temporal classification), i.e., basically LSTMs. Kaldi is working on "nnet3" which moves to CTC, as well. Speech was one of the places where HMMs were _huge_, so that's kind of a big deal." -PRACCU
"HMMs are only a small subset of generative models that offers quite little expressiveness in exchange for efficient learning and inference." - NEXTOS
"IMO, anything that be done with an HMM can now be done with an RNN. The only advantage that an HMM might have is that training it might be faster using cheaper computational resources. But if you have the $$$ to get yourself a GPU or two, this computational advantage disappears for HMMs." - SHERJILOZAIR
Also, they’re just a convenient model (for some problems), not a holy truth.
If you want an overview of Markov chains as statistical models in their own right, Durbin et al.'s Biological Sequence Analysis is a well-motivated overview.
As others have said, if you know know probability, start there.
Covers limit theorems and continuous time.