That commenter is just wrong. We have empirical tests of quality loss due to quantization and even down to 4bits the loss is so negligible no human would ever be able to detect it. The loss only even registers on the benchmarks after generating tens of thousands of full context generations.
>So do they use the weights that are say 32 bit floats and just round them to the nearest
That's how they used to do it, and still how 8bit quantization works. That's called "Round to Nearest" or RTN quantization. That's not how it works anymore though.
The current algorithms (GPTQ, RTPQ, etc.) are more complex, including things like lining up the weights in order of least to greatest, placing them in bins (typically 32 or 128 weights per bin), and then computing an offset for each bin which is added to the RTN value. In some cases bins are identical and redundant and can be re-used without saving the same identical bin twice. These are just a few of the space saving measures which go into effective low-bit quantization without sacrificing quality.
It's very similar to state of the art video codecs or image compression algorithms. A raw photograph taken by my digital camera is 60MB, but a PNG of the same photo is 30x smaller at 2MB without a single artifact. It should be no surprise that we can reduce models by 4x, 8x, or even more without sacrificing quality.