undefined | Better HN

0 pointsalbertzeyer2y ago0 comments

The audience of this paper are other researchers who already know the concept of attention, which was very well known already in the field. In such research papers, such things are never explained again, as all the researchers already know this or can read other sources, which are cited, but focus on the actual research questions. In this case, the research question was simply: Can we get away by just using attention and not using the LSTM anymore? Before that, everyone was using both together.

I think learning it following it more this historical development can be helpful. E.g. in this case here, learn the concept of attention, specifically cross attention first. And that is this paper: Bahdanau, Cho, Bengio, "Neural Machine Translation by Jointly Learning to Align and Translate", 2014, https://arxiv.org/abs/1409.0473

That paper introduces it. But even that is maybe quite dense, and to really grasp it, it helps to reimplement those things.

It's always dense, because those papers already have space constraints given by the conferences, max 9 pages or so. To get a better detailed overview, you can study the authors code, or other resources. There is a lot now about those topics, whole books, etc.

0 comments

BOOSTERHIDROGEN2y ago

What books cover exclusively about this topic ? Thanks

albertzeyerOP2y ago

This is frequently a topic here on HN. E.g.:

https://udlbook.github.io/udlbook/ (https://news.ycombinator.com/item?id=38424939)

https://fleuret.org/francois/lbdl.html (https://news.ycombinator.com/item?id=35767789)

https://www.fast.ai/ (https://news.ycombinator.com/item?id=24237207)

https://d2l.ai/ (https://news.ycombinator.com/item?id=38428225)

Some more:

https://news.ycombinator.com/item?id=35543774

There is a lot more. Just google for "deep learning", and you'll find a lot of content. And most of that will cover attention, as it is a really basic concept now.

lubesGordi2y ago

Thanks for the udl book (Understanding Deep Learning), that looks like a really great starting point.

stevenbedrick2y ago

To add to the excellent resources that have already been posted, Chapter 9 of Jurafsky and Martin's "Speech and Language Processing" has a nice overview of attention, and the next chapter talks specifically about the Transformer architecture: https://web.stanford.edu/~jurafsky/slp3/

CardenB2y ago

I doubt any.

j / k navigate · click thread line to collapse