SubQ: a sub-quadratic LLM with 12M-token context (opens in new tab)

(subq.ai)

84 pointsmitchwainer21d ago40 comments

40 comments

2001zhaozhao21d ago

Assuming this is real and much better than existing linear attention methods as advertised, not launching with a technical report is a big miss.

Edit: their blog post (https://subq.ai/how-ssa-makes-long-context-practical) does go pretty in-depth about it

Edit 2: the fact that they're going straight for an end-to-end coding product on day 1 is very ambitious. Other speed/efficiency-oriented AI companies (Cerebras and Inception come to mind) still don't have a first-party coding product after years. IMO this is absolutely the right way to go if they really do have the big breakthrough they're claiming.

avrilfanomar19d ago

you really call this 1-minute blog post "in-depth"?

mohsen121d ago

- magic.dev claimed 200M context window and it's been two years since and no real product yet.

- They are admitting that this is built on top of a Chinese model[1]

- They committed a huge chart crime with the Y axis of a chart comparing to Opus on their website that I can't find anymore (Too embarrassing to keep?). The delta between their score (81%) vs. Opus (87%) on SWE bench was hugely minimized

- They named the company subquadratic but in parts they said O(1) linear scaling. At O(1) you could do much more than 12M tokens context window. At O(log n) even.

I hope this is real but I doubt...

alexsubq19d ago

The chart crime was not intentional! We will not make you wait two years. We are O(n), not O(1). O(1) would unfortunately be an impossibility. We may as well do infinite context at that point!

bbctr115d ago

What’s keeping you from releasing paper and access to the model?

1 more reply

esafak17d ago

Good luck.

1 more reply

pvtmert17d ago

> not affiliated with subq,

i see in the linked post they mention O(n) not O(1). O(1) would basically be impossible and instant. Something like no compute required, constant results...

The name subquadratic is actually good and makes sense to me. Because today's models are usually O(n^2) or worse. Anything equals or less than O(n^1) is basically sub-quadratic.

Meanwhile O(log n) would be logarithmic as the log name indicates. But we have a long way to go there. Maybe with double tokenizer plus extensive caching it may be possible...

What I mean here is tokenizing the user input; then capturing intent; caching intent -> response. So that next time once you get the intent, you don't need to do full transformer inference compute. This can be logarithmic complexity in terms of time complexity.

artisin21d ago

Ah, I nearly forgot about magic.dev. I took a quick peek to check up on them. Welp, last social/blog activity was in... 2024. But hey, their careers page still says they're hiring! So they must be doing just fine.

shdh20d ago

They did raise over $500M

pstorm21d ago

I’m very surprised this isn’t getting more attention. Am I missing something?

It seems at or above SOTA on the given benchmarks, doesn’t have context rot, is orders of magnitude faster, and uses less compute that current transformer models. I suppose it’s just an announcement and we can’t test it ourselves yet.

alexsubq21d ago

We are SOTA in some ways and not in others, continuously working to make it better! We need a little more time to scale, as we are working on things like disaggregated prefill, etc., the norms of large-scale model infra.

I am happy to answer any questions!

supern0va21d ago

This seems super cool if as described, but I'm sure you can understand the skepticism.

Do you anticipate having any kind of public accessible chat interface for testing in the near future?

Also, what, if any, benefits are there for smaller context windows? Is there still a material improvement in cost to serve under say 256k? I'm curious about the broader implications for the space beyond improvements for very large context windows.

1 more reply

dirtyalt20d ago

I have questions.

Can you back up your claims?

Why did you not release the white paper in parallel with the product?

Feels really fishy.

2 more replies

jakevoytko21d ago

The proof is in the pudding. At this point, there have been plenty of models that overperformed on benchmarks and underperformed on real work. So my stance is that I'm curious, I'm excited to see where it goes, and I don't believe it until I can try it.

dvfjsdhgfv20d ago

> Am I missing something?

Yes, this product doesn't exist.

And the last time a company claimed something similar it disappeared after taking money from investors.

amw-zero20d ago

Yes you're missing something: the snake oil.

shdh21d ago

no one has access to it yet

no published benchmarks

no paper

no demonstrations of capabilities

remaximize21d ago

I agree, it's a real architectural breakthrough if true

_burner25620d ago

Funny how they claim a 12M context window, yet all benchmarks are cherry picked with a 1M context window. Also, nobody has questioned how they did a training run before receiving funding. SoTA training runs cost well above $10M, yet no mention of funding prior to yesterday, interesting.

creamyhorror21d ago

Whether this is real or not, multiple commenters here look like astroturfers - created in the past year (or hours) with very low karma

GorbachevyChase21d ago

There are some comments which are identical to comments on X as well. That is not the say the frontier labs do not engage in highly unethical marketing, but this is a little bit too obvious.

in-silico20d ago

I wonder how different their method actually is from other sub-quadratic sparse attention methods like Reformer [1] and Routing Transformer [2].

[1]: https://arxiv.org/abs/2001.04451

[2]: https://arxiv.org/abs/2003.05997

remaximize21d ago

This is pretty remarkable. We've spent a lot of time finding workarounds for LLMs reading long docs. Now that's gone.

roflcopter6915d ago

I'm usually okay with most LLM-assisted writing, but the amount of "it's not X. it's Y" style of phrases in https://subq.ai/how-ssa-makes-long-context-practical is disturbing.

Also, holy moly, the astroturfing.

But I'll still keep an eye on what they'll show up with in the next months. Sounds intriguing.

charliecs15d ago

Don't let a C-suite marketing video blow your mind. They are trying to discover the new Transformer, that's not easy. 12 million token context with worse quality means this isn't going anywhere. Want to bet me bitcoin that we won't be talking about them in 1 year? Heck, they may have found something great, but the prior should be one of skepticism.

kovek20d ago

> The core idea is content-dependent selection. For each query, the model selects which parts of the sequence are worth attending to, and computes attention exactly over those positions.

I don't know if this will help for things like understanding code, where the all relevant parts can be the file of 1000 lines that we are analyzing, and where every token is relevant in understanding recursion, loops, function calls, etc.

This sounds like it would be great to do SSA before passing things along to a code model like claude code.

Let me know if I misunderstood

alexsubq18d ago

Yeah, tokens are excluded, only pairwise relationships between tokens. Coding is something we are looking at carefully!

williamimoh21d ago

Looks like long context isn’t a problem anymore

tamarru21d ago

Neither is cost, and latency, in the long-term. LLMs ultimately become more economically viable than they are now, and broaden the scope of every existing LLM-driven application (particularly STS, conversational AI, etc, etc.)

lostmsu17d ago

No API access for independent verification - vaporware. See also comment about astroturfing accounts in this thread.

noashavit17d ago

An architecture where compute grows linearly with context length seems dangerous. It can get very expensive as context grows and performance degrades

j / k navigate · click thread line to collapse

40 comments

2001zhaozhao21d ago

Assuming this is real and much better than existing linear attention methods as advertised, not launching with a technical report is a big miss.

Edit: their blog post (https://subq.ai/how-ssa-makes-long-context-practical) does go pretty in-depth about it

avrilfanomar19d ago

you really call this 1-minute blog post "in-depth"?

mohsen121d ago

- magic.dev claimed 200M context window and it's been two years since and no real product yet.

- They are admitting that this is built on top of a Chinese model[1]

- They named the company subquadratic but in parts they said O(1) linear scaling. At O(1) you could do much more than 12M tokens context window. At O(log n) even.

I hope this is real but I doubt...

alexsubq19d ago

The chart crime was not intentional! We will not make you wait two years. We are O(n), not O(1). O(1) would unfortunately be an impossibility. We may as well do infinite context at that point!

bbctr115d ago

What’s keeping you from releasing paper and access to the model?

1 more reply

esafak17d ago

Good luck.

1 more reply

pvtmert17d ago

> not affiliated with subq,

i see in the linked post they mention O(n) not O(1). O(1) would basically be impossible and instant. Something like no compute required, constant results...

The name subquadratic is actually good and makes sense to me. Because today's models are usually O(n^2) or worse. Anything equals or less than O(n^1) is basically sub-quadratic.

Meanwhile O(log n) would be logarithmic as the log name indicates. But we have a long way to go there. Maybe with double tokenizer plus extensive caching it may be possible...

artisin21d ago

shdh20d ago

They did raise over $500M

pstorm21d ago

I’m very surprised this isn’t getting more attention. Am I missing something?

alexsubq21d ago

I am happy to answer any questions!

supern0va21d ago

This seems super cool if as described, but I'm sure you can understand the skepticism.

Do you anticipate having any kind of public accessible chat interface for testing in the near future?

1 more reply

dirtyalt20d ago

I have questions.

Can you back up your claims?

Why did you not release the white paper in parallel with the product?

Feels really fishy.

2 more replies

jakevoytko21d ago

dvfjsdhgfv20d ago

> Am I missing something?

Yes, this product doesn't exist.

And the last time a company claimed something similar it disappeared after taking money from investors.

amw-zero20d ago

Yes you're missing something: the snake oil.

shdh21d ago

no one has access to it yet

no published benchmarks

no paper

no demonstrations of capabilities

remaximize21d ago

I agree, it's a real architectural breakthrough if true

_burner25620d ago

creamyhorror21d ago

Whether this is real or not, multiple commenters here look like astroturfers - created in the past year (or hours) with very low karma

GorbachevyChase21d ago

There are some comments which are identical to comments on X as well. That is not the say the frontier labs do not engage in highly unethical marketing, but this is a little bit too obvious.

in-silico20d ago

I wonder how different their method actually is from other sub-quadratic sparse attention methods like Reformer [1] and Routing Transformer [2].

[1]: https://arxiv.org/abs/2001.04451

[2]: https://arxiv.org/abs/2003.05997

remaximize21d ago

This is pretty remarkable. We've spent a lot of time finding workarounds for LLMs reading long docs. Now that's gone.

roflcopter6915d ago

I'm usually okay with most LLM-assisted writing, but the amount of "it's not X. it's Y" style of phrases in https://subq.ai/how-ssa-makes-long-context-practical is disturbing.

Also, holy moly, the astroturfing.

But I'll still keep an eye on what they'll show up with in the next months. Sounds intriguing.

charliecs15d ago

kovek20d ago

> The core idea is content-dependent selection. For each query, the model selects which parts of the sequence are worth attending to, and computes attention exactly over those positions.

This sounds like it would be great to do SSA before passing things along to a code model like claude code.

Let me know if I misunderstood

alexsubq18d ago

Yeah, tokens are excluded, only pairwise relationships between tokens. Coding is something we are looking at carefully!

williamimoh21d ago

Looks like long context isn’t a problem anymore

tamarru21d ago

lostmsu17d ago

No API access for independent verification - vaporware. See also comment about astroturfing accounts in this thread.

noashavit17d ago

An architecture where compute grows linearly with context length seems dangerous. It can get very expensive as context grows and performance degrades

j / k navigate · click thread line to collapse