> because it is a mixture of experts model
Do you have a source for this? I also considered but never saw any evidence that this is how GPT 4 is implemented.
I've always wondered how a system of multiple specialized small LLMs (with a "router LLM" in front of all) would fare against GPT4. Do you know if anyone is working on such a project?