My recent work optimizing CPU evaluation https://justine.lol/matmul/ may have come at just the right time. Mixtral 8x7b always worked best at Q5_K_M and higher, which is 31GB. So unless you've got 4x GeForce RTX 4090's in your computer, CPU inference is going to be the best chance you've got at running 8x22b at top fidelity.