Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
0 points
ivankra
23d ago
0 comments
Share
But memory bandwidth (bottleneck for LLM inference) is only marginally improved, 614 GB/s vs 546 GB/s for M4/M5 Max - where is this 4x improvement coming from?
I think I'll pass on upgrading.
undefined | Better HN
0 comments
default
newest
oldest
singhrac
23d ago
It’s prompt processing so prefill - that’s compute bound not memory.
0x457
23d ago
4x is on Time To First Token it's on the graph.
j
/
k
navigate · click thread line to collapse