Because for every increase in efficiency that you get on the phone, you get on the datacenter too. (and likely on the modem as well).
The gap will always be there. If the silicon gets efficient enough to compute a question/response on the phone in 1 joule, the datacenter will be able to do it with a way smarter way better model in 0.1 joule. And also if the silicon gets efficient enough, that means everything else on the phone will get more efficient too and the battery will get smaller and lighter, so 1 joule will be more 'expensive' relative to the battery SOC. It will never make sense no matter how good the silicon gets.
We have GPT-4 level performance in 22b models today. Only a tiny tiny minority actually use those, because opus is that much better. When it comes to energy efficiency the bar gets higher everywhere in inference and training.