undefined | Better HN

0 pointsSemaphor2y ago0 comments

> The llama.cpp 7B and 13B models can be run on CPU if you have enough RAM.

Bigger ones as well, you just have to wait longer. Nothing for real time usage, but if you can wait 10-20 minutes, you can use them on CPU.

0 comments

int_19h2y ago

It's not even that bad. Core i7-12700K with DDR5 gives me ~1 word per second on llama-30b - that is fast enough for real-time chat, with some patience. And things are even better on M1/M2 Macs.

Joeri2y ago

The critical factor seems to be the ability to fit the whole model in RAM (--mlock option in oobabooga). With Apple's RAM prices most M1/M2 owners probably don't have the 32 GB RAM required to fit a 4bit 30B model.

SemaphorOP2y ago

I have 64 GB RAM, but only a Ryzen 5 3600, and the larger models are very slow ;)

j / k navigate · click thread line to collapse