It does result in a significant degradation relative to unquantized model
of the same size, but even with simple llama.cpp K-quantization, it's still worth it all the way down to 2-bit. The chart in this llama.cpp PR speaks for itself:
https://github.com/ggerganov/llama.cpp/pull/1684#issue-17396...