I haven’t had great luck with the wizard as a counter point. The token generation is unbearably slow. I might have been using too large of a context window, though. It’s an interesting model for sure. I remember the output being decent. I think it’s already surpassed by other models like Qwen.
Long context windows are a problem. I gave Qwen 2.5 70b a ~115k context and it took ~20min for the answer to finish.
The upside of MoE models vs 70b+ models is that they have much more world knowledge.