one example may be an AI can look at the car video feed and use it to determine the speed of a car but to recognize the mile/road markers it may have to use the traditional CNN type modules, kinda like what we do.
Overall I feel this has a potential of ballooning into something really interesting like theoretical physics research could be largely automated using this type of combo (in somewhat not-too-distant future).
I strongly believe the first person that figures out how to interface these language models with some kind of knowledge model or source of truth will win a lot of the pot for AI technology. Human feedback clearly helped GPT get to where it is now, but It's clearly optimized for answers that appear good rather than are good and based on known facts.
edit: I spoke too soon, it actually has a module to publish AR model object of the design you created. so this could potentially be directly usable with AR glasses.
I wonder if it would be possible to train a 7B or 13B model to generate code in just one specific programming language. Train it with example problem input/ program output pairs. Then train another small model to translate natural language in a specific domain into an input for the coder model. And maybe a third to translate that into a different real programming language.
The point of this being that you can use smaller GPU instances and dedicate all of the limited power of each model to narrower domain that may be more tractable for it.
[0] https://www.wolframcloud.com/obj/yanz/Base/Temp/AR/201a538b-...
I tried a Chat Notebook on Wolfram Cloud this morning, and asked it to write a script to fetch data from DBPedia and present it. It generated Wolfram Language code, so that was very cool.
Interesting test of LLMs, this one failed.