undefined | Better HN

0 pointsDennisP1mo ago0 comments

No CUDA, 1.6T parameters but with 49B active...does that mean you can run it efficiently on a 64GB macbook?

0 comments

no, you need as much ram as the total model. But it means you can load the most important tensors in a smaller GPU. So you can run it on a PC with say 2 32gb rtx 5090 and 1tb+ of system ram.

leodavi1mo ago

Probably not. The active parameter set may change from token to token, based on my understanding of MoE, so you'd be streaming (at the worst case, unlikely for a real scenario but frames the problem) 49B parameters from SSD for every output token...

j / k navigate · click thread line to collapse

0 comments

segmondy1mo ago

no, you need as much ram as the total model. But it means you can load the most important tensors in a smaller GPU. So you can run it on a PC with say 2 32gb rtx 5090 and 1tb+ of system ram.

leodavi1mo ago

j / k navigate · click thread line to collapse