undefined | Better HN

0 pointswxs9y ago0 comments

So yeah: we can focus on vectors at different levels of the net and these are in some sense different semantic spaces. In the article I talk about a level immediately before it projects onto the emoji vectors. If you look at the output after the projection (and do a softmax) you get a probability distribution across all emoji. This would be a different space in which each axis is an emoji, rather than the emoji being points distributed around the space.

0 comments

joefkelley9y ago

Awesome, thanks for clarifying. So does the training optimize some property of the "semantic" layer immediately before the final emoji prediction layer? Or does it just optimize accuracy of emoji prediction directly?

And then the t-SNE projection shown in the article is based on this same layer (one before prediction)?

wxsOP9y ago

Well those are sort of equivalent. But yeah, we use cross-entropy between the projected output and the target emoji distribution as our objective to minimize.

And yes, we do the t-SNE on that pre-projection space. That's why we can visualize the targets (emoji) in it. We can also t-SNE the word embeddings themselves — the input to the RNN — which is also kind of interesting. It automatically learns all kinds of structures there. Chris Olah has a good post on word embeddings if you're interested: http://colah.github.io/posts/2014-07-NLP-RNNs-Representation...

j / k navigate · click thread line to collapse

0 comments

joefkelley9y ago

And then the t-SNE projection shown in the article is based on this same layer (one before prediction)?

wxsOP9y ago

Well those are sort of equivalent. But yeah, we use cross-entropy between the projected output and the target emoji distribution as our objective to minimize.

j / k navigate · click thread line to collapse