no, you need as much ram as the total model. But it means you can load the most important tensors in a smaller GPU. So you can run it on a PC with say 2 32gb rtx 5090 and 1tb+ of system ram.
Probably not. The active parameter set may change from token to token, based on my understanding of MoE, so you'd be streaming (at the worst case, unlikely for a real scenario but frames the problem) 49B parameters from SSD for every output token...