AI noob here. Is every single model in iOS really just a thin adapter on top of one base model? Can everything they announced today really be built on top of one base LLM model with a specific type of architecture? What about image generation? What about text-to-speech?
If they’re obviously different models, they can’t load them all at once into RAM. If they have to load from storage every time an app is opened, how will they do this fast enough to maintain low latency?