It seems like the people who are ohhing and ahhing at the former and the people who are frustrated that this kind of this is unbelivably impractical to productize will be doomed to talk past one another forever. The text generation models, image generation models, speech-to-text and text-to-speech have reached impressive product stages. Multi-model hasn't got there because no one is really sure what to actually do with the thing outside of make cool demos.