If I understand correctly there's three things here:
- on-device models, which will power any tasks it's able to, including summarisation and conversation with Siri
- private compute models (still controlled by apple), for when it wants to do something bigger, that requires more compute
- external LLM APIs (only chatgpt for now), for when the above decide that it would be better for the given prompt, but always asks the user for confirmation