undefined | Better HN

0 pointsFergusArgyll2y ago0 comments

You shouldn't have to quantize it that much, maybe you're running a lot of other programs while running inference?

Also, try using pure llama.cpp, AFAIK it's the least possible overhead

0 comments

Getting more value out of phi-2-sized models is where you really want to be on lower-end M1's.

j / k navigate · click thread line to collapse

0 pointsFergusArgyll2y ago0 comments

You shouldn't have to quantize it that much, maybe you're running a lot of other programs while running inference?

Also, try using pure llama.cpp, AFAIK it's the least possible overhead

Getting more value out of phi-2-sized models is where you really want to be on lower-end M1's.

j / k navigate · click thread line to collapse