undefined | Better HN

0 pointsint_19h1y ago0 comments

With a 128Gb Mac, you can even run 405b at 1-bit quantization - it's large enough that even with the considerable quality drop that entails, it still appears to be smarter than 70b.

0 comments

ComputerGuru1y ago

Just to clarify, you are saying 1b-quantized 405b is smarter than 70b unquantized?

int_19hOP1y ago

You need to quantize 70b to run it on that kind of hardware as well, since even float16 wouldn't fit. But 405b:IQ1_M seems to be smarter than 70b:Q4_K_M in my experiments (admittedly very limited because it's so slow).

Note that IQ1_M quants are not really "1-bit" despite the name. It's somewhere around 1.8bpw, which just happens to be enough to fit the model into 128Gb with some room for inference.

j / k navigate · click thread line to collapse

0 comments

ComputerGuru1y ago

Just to clarify, you are saying 1b-quantized 405b is smarter than 70b unquantized?

int_19hOP1y ago

Note that IQ1_M quants are not really "1-bit" despite the name. It's somewhere around 1.8bpw, which just happens to be enough to fit the model into 128Gb with some room for inference.

j / k navigate · click thread line to collapse