undefined | Better HN

0 pointsirthomasthomas7mo ago0 comments

Oh, I didn't know that. Weird!

0 comments

It was natively trained in FP4. Probably both to reduce VRAM usage at inference time (fits on a single H100), and to allow better utilization of B200s (which are especially fast for FP4).

irthomasthomasOP7mo ago

Interesting, thanks. I didn't know you could even train at FP4 on H100s

reissbaker7mo ago

It's impressive they got it to work — the lowest I'd heard of this far was native FP8 training.

j / k navigate · click thread line to collapse

0 comments

reissbaker7mo ago

It was natively trained in FP4. Probably both to reduce VRAM usage at inference time (fits on a single H100), and to allow better utilization of B200s (which are especially fast for FP4).

irthomasthomasOP7mo ago

Interesting, thanks. I didn't know you could even train at FP4 on H100s

reissbaker7mo ago

It's impressive they got it to work — the lowest I'd heard of this far was native FP8 training.

j / k navigate · click thread line to collapse