undefined | Better HN

0 pointsttul2y ago0 comments

There are many MoE architectures and I suppose we don’t know for sure which OpenAI is using. The “selection” of the right mix of models is something that a network learns and it’s not a complex process. Certainly no more complex than training an LLM.

0 comments

axpy9062y ago

When I wrote “backend” was a poor choice of a word. “Meta-model” is probably a better choice of wording.

I hope it did not detract too much from the point of focusing on subtasks and modalities for FOSS as GPT 4 was built on a $163 million budget.

Finally, good point. We’ve got no idea of what OpenAI’s MoE approach is and how it works. I went back to Metas 2022 NLLB-200 system paper and they didn’t even publish the exact details of the router (gate).

ttulOP2y ago

Yeah, good point on the importance of FOSS focusing on subtasks... because FOSS isn't going to be spending $150M+ training a model any time soon without something like government backing.

j / k navigate · click thread line to collapse