M3 Ultra — 819 GB/s
M2 Ultra — 800 GB/s
M1 Ultra — 800 GB/s
M5 Max (40-core GPU) — 610 GB/s
M4 Max (16-core CPU / 40-core GPU) — 546 GB/s
M4 Max (14-core CPU / 32-core GPU) — 410 GB/s
M2 Max — 400 GB/s
M3 Max (16-core CPU / 40-core GPU) — 400 GB/s
M1 Max — 400 GB/s
Or, just counting portable/macbook chips: M5 max (top model, 64/128G) M4 max (top model, 64/128G), M1 max (64G). Everything else is slower for local LLM inference.
TLDR: An M1 max chip is faster than all M5 chips, with the sole exception of the 40-GPU-core M5 max, the top model, only available in 64 and 128G versions. An M5 pro, any M5 pro (or any M* pro, or M3/M2 max chip) will be slower than an M1 max on LLM inference, and any Ultra chip, even the M1 Ultra, will be faster than any max chip, including the M5 max (though you may want the M2 ultra for bfloat16 support, maybe. It doesn't matter much for quantized models)