Cross Entropy (opens in new tab)

(pandeykartikey.github.io)

21 pointspandeykartikey8y ago12 comments

12 comments

What I always wondered about is that this generally assumes that classes are assigned with an equal (human level) error probability. While this is certainly the case in heavily curated example dataset, many real world scenarios only consist of a considerably labeled positive set while the negative set is often drawn randomly from the background. Is there anything on how this can be taken into account (Besides weighting, obviously)?

pandeykartikeyOP8y ago

I too have been thinking about this problem, but I have yet to come across a viable solution.

banachtarski8y ago

"Cross-entropy is defined as the difference between the following two probability distributions"

Huh? No this is a mathematically imprecise statement (and not correct either). Most explanations use references to information theory, where a perfect knowledge of the desired probability distribution leads to a perfect allocation of bits in a binary encoding. The entropy is the expected number of bits when this allocation is done using the incorrect distribution, and obviously the goal is to minimize this, hence why it is suitable for use as a loss function.

tziki8y ago

> The entropy is the expected number of bits when this allocation is done using the incorrect distribution

Is there any source that would derive and/or explain this more in-depth? I've been trying to develop an intuition for this, but haven't come across a good explanation.

banachtarski8y ago

The other reply mentioning "kullback-leibler divergence" (aka KL divergence) is what you need to understand as this is the fundamental concept. Minimizing this quantity is equivalent to minimizing the given "cross-entropy loss" expression. More generally to understand where this comes from, you'll want to read about information theory.

banachtarski8y ago

https://rdipietro.github.io/friendly-intro-to-cross-entropy-...

I found this which seems much better

pizza8y ago

the OG http://math.harvard.edu/~ctm/home/text/others/shannon/entrop...

also lookup kullback-leibler divergence

pixelpoet8y ago

Aside: I'm always surprised how few people notice that e.g. "cos" is rendered differently to "\cos" in TeX; for a discipline largely characterised by attention to detail, almost no programmers seem to notice.

thanatropism8y ago

I'm always surprised when people use ^T for transpose. Use ^\top, people.

dang8y ago

Please do not put "Show HN" on blog posts. This is in the rules: https://news.ycombinator.com/showhn.html.

jdonaldson8y ago

Why link to the disqus thread?

vedanshbhartia8y ago

Nice read! Looking forward to more blogs.

j / k navigate · click thread line to collapse

12 comments

patall8y ago

pandeykartikeyOP8y ago

I too have been thinking about this problem, but I have yet to come across a viable solution.

banachtarski8y ago

"Cross-entropy is defined as the difference between the following two probability distributions"

tziki8y ago

> The entropy is the expected number of bits when this allocation is done using the incorrect distribution

Is there any source that would derive and/or explain this more in-depth? I've been trying to develop an intuition for this, but haven't come across a good explanation.

banachtarski8y ago

https://rdipietro.github.io/friendly-intro-to-cross-entropy-...

I found this which seems much better

pizza8y ago

the OG http://math.harvard.edu/~ctm/home/text/others/shannon/entrop...

also lookup kullback-leibler divergence

pixelpoet8y ago

thanatropism8y ago

I'm always surprised when people use ^T for transpose. Use ^\top, people.

dang8y ago

Please do not put "Show HN" on blog posts. This is in the rules: https://news.ycombinator.com/showhn.html.

jdonaldson8y ago

Why link to the disqus thread?

vedanshbhartia8y ago

Nice read! Looking forward to more blogs.

j / k navigate · click thread line to collapse