Cognition Releases SWE-1.5: Near-SOTA Coding Performance at 950 tok/s (opens in new tab)

(cognition.ai)

11 pointsyashvg6mo ago8 comments

8 comments

swyx6mo ago

(coauthor) xpost here: https://x.com/cognition/status/1983662836896448756

happy to answer any questions. i think my higher level insight to paraphrase McLuhan, "first the model shapes the harness, then the harness shapes the model". this is the first model that combines cognition's new gb200 cluster, cerebras' cs3 inference, and data from our evals work with {partners} as referenced in https://www.theinformation.com/articles/anthropic-openai-usi...

CuriouslyC6mo ago

In the interest of transparency you should update your post with the model you fine tuned, it matters.

swyx6mo ago

this is not a question and an assertion that your values are more important than mine, with none of my context nor none of your reasoning. you see how this tone is an issue?

regardless of what i'm allowed to say, i will personally defend that actually its increasingly less important the qualities of the base model you choose as long as its "good enough", bc then the RL/posttrain qualities and data takes over from there and is the entire point of differentiation

CuriouslyC6mo ago

If you had enough tokens to completely wash out the parent latent distribution you would have just trained a new model instead of fine tuning. That means by definition your model is still inheriting properties of the parent, and for your business customers who want predictable, understandable systems, knowing the inherited properties of your model is going to be useful.

I think the real reason is that it's a Chinese model (I mean, come on) and your parent company doesn't want any political blowback.

1 more reply

pandada86mo ago

very curious, which model can only run up to 950 tok/s even with cerebras.

j / k navigate · click thread line to collapse

8 comments

swyx6mo ago

(coauthor) xpost here: https://x.com/cognition/status/1983662836896448756

CuriouslyC6mo ago

In the interest of transparency you should update your post with the model you fine tuned, it matters.

swyx6mo ago

this is not a question and an assertion that your values are more important than mine, with none of my context nor none of your reasoning. you see how this tone is an issue?

CuriouslyC6mo ago

I think the real reason is that it's a Chinese model (I mean, come on) and your parent company doesn't want any political blowback.

1 more reply

pandada86mo ago

very curious, which model can only run up to 950 tok/s even with cerebras.

j / k navigate · click thread line to collapse