undefined | Better HN

0 pointsjules8y ago0 comments

Well, information theory isn't much more than the logarithm of probability theory, so it doesn't hurt to learn it anyway. The only thing you need to know is that given a probability distribution P there exist a compression scheme to encode a value X with a message of P_length(X) = log(1/P(X)) bits. This can be summarised as BITS = log(1/PROBABILITY). Entropy is just the average number of bits you need to encode a random value from distribution P with the compression scheme of distribution P, i.e. E_P[P_length(X)]. The KL(P,Q) divergence is when you encode a random value from distribution P with the compression scheme of distribution Q. Say you're compressing english text but you're using a compressor tailored to spanish. The KL divergence is how many extra bits you need (on average) compared to encoding the english text with the english compressor:

KL(P,Q) = E_P[Q_length(X)] - E_P[P_length(X)]

0 comments

murbard28y ago

> information theory isn't much more than the logarithm of probability theory

stealing

j / k navigate · click thread line to collapse