We are releasing new 2-bit Mixtral models. These ones use a mixed HQQ 4-bit/2-bit configuration, resulting in a significantly improved model (ppl 4.69 vs. 5.90) with a negligible 0.20 GB VRAM increase.
Base: https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-a...
Instruct: https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-Instruct-...
Shout-out to Artem Eliseev and Denis Mazur for suggesting this idea ( https://github.com/mobiusml/hqq/issues/2 )