Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
irusensei
1y ago
0 comments
Share
You are probably swapping. On M3 max with similar memory bandwidth the output is around 4t/s which is normally on par with most people's reading speed. Try different quants.
0 comments
default
newest
oldest
steve_adams_86
1y ago
I'm on an M2 max so I shouldn't be too far behind. I'm not actually sure how the model I'm using was quantized to be honest. I'll give it a try.
j
/
k
navigate · click thread line to collapse