It's almost always better to pay more for the smarter model, than to potentially give a worse player experience.
If they had 1M+ players there would certainly be room to optimize, but starting out you'd certainly spend more trying engineer the model switcher than you would save in token costs.
PS: I think MCP/Tool Calls are a boondoggle and LLMs yearn to just run code. It's crazy how much better this works than JSON schema etc.
It's definitely a research project, this has never been done before.
(Off-topic AMA question: Did you see my voxel grid visibility post?)
We use a ton of smaller models (embeddings, vibe checks, TTS, ASR, etc) and if we had enough scale we'll try to run those locally for users that have big enough GPUs.
(You mean the voxel grid visibility from 2014?! I'm sure I did at the time... but I left MC in 2020 so don't even remember my own algorithm right now)
https://www.dexerto.com/gaming/where-winds-meet-players-are-...
https://www.rockpapershotgun.com/where-winds-meet-player-con...
Some other cool ones I've seen: https://store.steampowered.com/app/2542850/1001_Nights/ https://www.playsuckup.com/