> in as little as 16GB of RAM with room to spare.
I don't think that's the case, for full speed you still need (5B*8)/2+2~fewB overhead.
I think the experts chosen per-token? That means that yes you technically only need two in VRAM memory+router/overhead per token, but you'll have to constantly be loading in different experts unless you can fit them all, which would still be terrible for performance.
So you'll still be PCIE/RAM speed limited unless you can fit all of the experts into memory (or get really lucky and only need two experts).