Ring attention with blockwise transformers for near-infinite context (opens in new tab)

(arxiv.org)

47 pointsmuggermuch2y ago20 comments

20 comments

Get ready for some countries putting your entire surveillance logs in the LLM and asking it if you've been naughty or not, automatically, every day.

aipushup2y ago

we are so f*ed

optimalsolver2y ago

Rather than all this effort to work around the flaws of the transformer model, maybe researchers should be looking for a better architecture altogether.

The absolutely insane amount of compute that transformers consume could probably be better used for neuroevolutionary search.

treyd2y ago

The unreasonable effectiveness of transformer attention outweighs a lot of the downsides of limited context length for many applications.

PartiallyTyped2y ago

I mean ... if you think about it, attention changes the effective weights of a model.

I am fairly certain that if you try, you can show that for any particular sequence of tokens of length N, the N-1 tokens induce a residual FFNN that results in exactly the same distribution over the next tokens given just the Nth.

treyd2y ago

You may be interested in "Linear Transformers Are Secretly Fast Weight Programmers": https://arxiv.org/abs/2102.11174

1 more reply

mashygpig2y ago

Sounds interesting, try it and share your results here :)

ivalm2y ago

Why do you think so? Many people tried neuroinspired stuff and had no results. Entire field of comp neurosci produced what exactly over the past decade? People invest in transformers because they work while everything else so far doesn’t.

esafak2y ago

It's cool to see the founder of a major company still write papers.

heisenzombie2y ago

Last author can mean anything from:

- I had the whole idea and held someone's hand the whole time they did the grunt work, then I wrote the whole damn paper except for making the fonts correct.

to:

- I am the boss. I am vaguely aware that someone is researching something.

ctoth2y ago

I was reading the paper, and then it hit me ... Did they use ChatGPT to generate their abstract?

Almost certainly not, but I wonder how well it does at that?

Come up with a (fake) way to improve the transformer architecture, then write the title and abstract.

Title: Novel Cyclical Attention Mechanism Enhances Transformer Architectures

Abstract: The transformer architecture has emerged as a powerful model for handling sequential data across various domains. Despite its success, the fixed nature of its attention mechanism often restricts its ability to adapt to the dynamic nature of real-world data sequences. In this paper, we propose a novel Cyclical Attention Mechanism (CAM) that augments the standard transformer architecture. Unlike conventional attention mechanisms which allocate attention statically based on previous layers, the CAM operates in a cyclical fashion, allowing for a dynamic, recurrent redistribution of attention over the sequence at each layer of the transformer. This cyclical process is facilitated through a novel temporal feedback loop that integrates information from both previous and subsequent layers, allowing for a more nuanced understanding of long-term dependencies within the data. Moreover, the proposed mechanism introduces an adaptive temporal gating system that intelligently modulates the flow of information through the cycles, ensuring optimal retention and refinement of relevant information throughout the network. We demonstrate through extensive experiments on various benchmark datasets that the Cyclical Attention Mechanism significantly improves the model's ability to handle long-range dependencies, leading to substantial improvements in performance across multiple tasks including language modeling, translation, and sequence labeling. Our findings pave the way for a new line of research into dynamic attention mechanisms within transformer architectures, showcasing the potential for enhanced performance and adaptability in handling complex sequential data[1].

I know, I find it tiresome too when people share their ChatGPT responses, but this really struck me. We are very, very close to those being indistinguishable.

* I'd hate to be trying to sort out valid from invalid papers these days.

* How close are AIs to doing AI research?

* If an AI can predict something similar to your paper, is it more or less likely to be valid/true/reproduceable?

[1]: https://chat.openai.com/share/ba769733-e98d-48d3-809a-7611f3...

aipushup2y ago

It means that ChatGPT has seen too many papers. Honestly, when I was a PhD student back then, every time I review papers, I felt like are these papers written by me because I would write them that way LoL

chaz62y ago

I am disappointed to see a paper with the phrase, in the title no less, "Near-Infinite". Something is either infinite or not; there can be no "near".

munchler2y ago

Maybe "unlimited" would be better?

ShamelessC2y ago

Virtually infinite?

sdiupIGPWEfh2y ago

I like "Unbounded".

sharemywin2y ago

Unlimited*

like all the Saas feature pages.

thefourthchime2y ago

WOWOWOWOWOWOWOWOOWOWOWOWOWOWOWOW

j / k navigate · click thread line to collapse

20 comments

chpatrick2y ago

Get ready for some countries putting your entire surveillance logs in the LLM and asking it if you've been naughty or not, automatically, every day.

aipushup2y ago

we are so f*ed

optimalsolver2y ago

Rather than all this effort to work around the flaws of the transformer model, maybe researchers should be looking for a better architecture altogether.

The absolutely insane amount of compute that transformers consume could probably be better used for neuroevolutionary search.

treyd2y ago

The unreasonable effectiveness of transformer attention outweighs a lot of the downsides of limited context length for many applications.

PartiallyTyped2y ago

I mean ... if you think about it, attention changes the effective weights of a model.

treyd2y ago

You may be interested in "Linear Transformers Are Secretly Fast Weight Programmers": https://arxiv.org/abs/2102.11174

1 more reply

mashygpig2y ago

Sounds interesting, try it and share your results here :)

ivalm2y ago

esafak2y ago

It's cool to see the founder of a major company still write papers.

heisenzombie2y ago

Last author can mean anything from:

- I had the whole idea and held someone's hand the whole time they did the grunt work, then I wrote the whole damn paper except for making the fonts correct.

to:

- I am the boss. I am vaguely aware that someone is researching something.

ctoth2y ago

I was reading the paper, and then it hit me ... Did they use ChatGPT to generate their abstract?

Almost certainly not, but I wonder how well it does at that?

Come up with a (fake) way to improve the transformer architecture, then write the title and abstract.

Title: Novel Cyclical Attention Mechanism Enhances Transformer Architectures

I know, I find it tiresome too when people share their ChatGPT responses, but this really struck me. We are very, very close to those being indistinguishable.

* I'd hate to be trying to sort out valid from invalid papers these days.

* How close are AIs to doing AI research?

* If an AI can predict something similar to your paper, is it more or less likely to be valid/true/reproduceable?

[1]: https://chat.openai.com/share/ba769733-e98d-48d3-809a-7611f3...

aipushup2y ago

chaz62y ago

I am disappointed to see a paper with the phrase, in the title no less, "Near-Infinite". Something is either infinite or not; there can be no "near".

munchler2y ago

Maybe "unlimited" would be better?

ShamelessC2y ago

Virtually infinite?

sdiupIGPWEfh2y ago