undefined | Better HN

0 pointsPalmik1y ago0 comments

You are underselling or not understanding the breakthrough. They trained 600B model on 15T tokens for <$6/m. Regardless of the provenance of the tokens, this in itself is impressive.

Not to mention post-training. Their novel GRPO technique used for preference optimization / alignment is also much more efficient than PPO.

0 comments

sho_hn1y ago

Let's call it underselling. :-) Mostly because I'm not sure anyone's independently done the math and we just have a single statement from the CEO. I do appreciate the algorithmic improvements, and the excellent attention-to-performance-in-detail stuff in their implementation (careful treatment of precision, etc.), making the H800s useful, etc. I agree there's a lot there.

j / k navigate · click thread line to collapse

0 pointsPalmik1y ago0 comments

You are underselling or not understanding the breakthrough. They trained 600B model on 15T tokens for <$6/m. Regardless of the provenance of the tokens, this in itself is impressive.

Not to mention post-training. Their novel GRPO technique used for preference optimization / alignment is also much more efficient than PPO.

0 comments

sho_hn1y ago

j / k navigate · click thread line to collapse