> FlashAttention ... such that you can remain compute bound at lower batch sizes during decode.
So, which one is it then?
https://jax-ml.github.io/scaling-book/inference/ - good read!