undefined | Better HN

0 pointsmemhole1y ago0 comments

I haven’t had great luck with the wizard as a counter point. The token generation is unbearably slow. I might have been using too large of a context window, though. It’s an interesting model for sure. I remember the output being decent. I think it’s already surpassed by other models like Qwen.

0 comments

terhechte1y ago

Long context windows are a problem. I gave Qwen 2.5 70b a ~115k context and it took ~20min for the answer to finish. The upside of MoE models vs 70b+ models is that they have much more world knowledge.

j / k navigate · click thread line to collapse

0 comments

terhechte1y ago

j / k navigate · click thread line to collapse