I already know an application for this, and AFAIK it's being explored in the SaaS space: guided learning experiences and tutoring for individuals.
My kids, for instance, love to hammer Alexa with random questions. They would spend a huge amount of time using a better interface, esp. with quick feedback, that provided even deeper insight and responses to them.
Taking this and tuning it to specific audiences would make it a great tool for learning.
Great, using GPT-4 the kids will be getting a lot of hallucinated facts returned to them. There are good use cases for tranformer currently but they're not at the "impact company earnings or country GDP" stage currently, which is the promise that the whole industry has raised/spent 100+B dollars on. Facebook alone is spending 40B on AI. I believe in the AI future, but the only thing that matters for now is that the models improve.
Compared to YouTube and Google SEO trash, or Google Home / Alexa (which do search + wiki retrieval), at the moment GPT-4 and Claude are unironically safer for kids: no algorithmic manipulation, no ads, no affiliated trash blogs, and so on. Bonus is that it can explain on the level of complexity the child will understand for their age
I still see this as a cool application. Anything that provides easier access to knowledge and improved learning is a boon.
I'd rather worry about the potential economic impact than worry about possible hallucinations from fun questions like "how big is the sun?" or "what is the best videogame in the world?", etc.
There's a ton you can do here, IMO.
Take a look at mathacademy.com, for instance. Now slap a voice interface on it, provide an ability for kids/participants to ask questions back and forth, etc. Boom: you've got a math tutor that guides you based on your current ability.
What if we could get to the same style of learning for languages? For instance, I'd love to work on Spanish. It'd be far more accessible if I could launch a web browser and chat through my mic in short spurts, rather than crack open Anki and go through flash cards, or wait on a Discord server for others to participate in immersive conversation.
Tons of cool applications here, all learning-focused.
It seems like the people who are ohhing and ahhing at the former and the people who are frustrated that this kind of this is unbelivably impractical to productize will be doomed to talk past one another forever. The text generation models, image generation models, speech-to-text and text-to-speech have reached impressive product stages. Multi-model hasn't got there because no one is really sure what to actually do with the thing outside of make cool demos.