undefined | Better HN

0 pointshmottestad1y ago0 comments

They’ve gone with a single 3B model and several “adapters” for each use case. One adapter is good at summarising while another good a generating message replies.

0 comments

onesociety20221y ago

AI noob here. Is every single model in iOS really just a thin adapter on top of one base model? Can everything they announced today really be built on top of one base LLM model with a specific type of architecture? What about image generation? What about text-to-speech? If they’re obviously different models, they can’t load them all at once into RAM. If they have to load from storage every time an app is opened, how will they do this fast enough to maintain low latency?

wmf1y ago

The main LLM is only 1.5 GB so it should only take a half second to load. Or they could keep it loaded. The other models may be even smaller.

glial1y ago

Maybe they use the "Siri is waking up and the screen wabbles" animation time for loading the model. That would be clever.

1 more reply

j / k navigate · click thread line to collapse