ggml quantization is very easy with the official llama.cpp repo. Its quick and mostly dependency free, and you can pick the perfect size for your CPU/GPU pool.
Some of the newer models have slightly different architectures, so he explains any differences and shows a llama.cpp invocation. Plus you can avoid pulling the larger dataset.