I think what they mean about latency not mattering is that latency to the LLM provider doesn’t matter. So why run it yourself when there are API’s you can hit that provide a better overall experience (and seems to be dropping in cost 90% year over year).