Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
0 points
avazhi
2d ago
0 comments
Share
Qwen's MoE models are god awful when they are only running 2B parameters or whatever they downscale to while active. It isn't a 400B model if there's only several orders of magnitude less parameters active when you're actually inferencing...
undefined | Better HN
0 comments
No comments yet.