undefined | Better HN

0 pointsjalammar5y ago0 comments

Wonderful! Thanks!

I am curious about those recent O(L) attention transformers (see slide 106 of http://gabrielilharco.com/publications/EMNLP_2020_Tutorial__...). If these methods are converging towards a new self-attention mechanism, I'd love to try illustrating that.

What other attention modes are you referring to? Did something in particular catch your attention?

0 comments

Grimm15y ago

Personally, I implemented this just yesterday.

https://arxiv.org/pdf/1703.03130.pdf

It's a bit older now but I was looking for a self attention method without resorting to a transformer model and this proposed an interesting implementation that wound up being very successful for my problem case.

j / k navigate · click thread line to collapse

0 comments

Grimm15y ago

Personally, I implemented this just yesterday.

https://arxiv.org/pdf/1703.03130.pdf

j / k navigate · click thread line to collapse