2-bit and 4-bit versions of Mixtral (opens in new tab)

(huggingface.co)

4 pointsibuildthings2y ago1 comments

1 comments

We are releasing 2-bit and 4-bit quantized versions of Mixtral utilizing the HQQ method that we just published https://mobiusml.github.io/hqq_blog/ and https://github.com/mobiusml/hqq.

The 2-bit version can run on a 24GB Titan RTX.

In terms of perplexity scores on the wikitext2 dataset, the results are as follows: Mixtral: 26GB / 3.79 Llama2-70B: 26.37GB / 4.13

j / k navigate · click thread line to collapse

We are releasing 2-bit and 4-bit quantized versions of Mixtral utilizing the HQQ method that we just published https://mobiusml.github.io/hqq_blog/ and https://github.com/mobiusml/hqq.

The 2-bit version can run on a 24GB Titan RTX.

In terms of perplexity scores on the wikitext2 dataset, the results are as follows: Mixtral: 26GB / 3.79 Llama2-70B: 26.37GB / 4.13

j / k navigate · click thread line to collapse