undefined | Better HN

0 pointszozbot23426d ago0 comments

That remark was specific to newer models like Kimi 2.x and DeepSeek V4 series, and this is clearly stated in my comment.

As for other models, we quantize them because we are generally constrained by the model's total footprint in bytes, and running a larger model that's been quantized to fit in the same footprint as a smaller one improves performance compared to a smaller original, generally up to Q4 or so, with even tighter quantizations (up to Q2) being usable for some uses such as general Q&A chat.

0 comments

hu326d ago

When you say DeepSeek v4... you do realise it is a 1.6T param model right?

What kind of consumer hardware can run it reasonably in your mind?

j / k navigate · click thread line to collapse

0 comments

hu326d ago

When you say DeepSeek v4... you do realise it is a 1.6T param model right?

What kind of consumer hardware can run it reasonably in your mind?

j / k navigate · click thread line to collapse