The new tensor cores, sorry, "Neural Accelerator" only really help with prompt preprocessing aka prefill, and not with token generation. Token generation is memory bound.
Hopefully the Ultra version (if it exists) has a bigger jump in memory bandwidth and maximum RAM.