undefined | Better HN

0 pointsnoduerme3y ago0 comments

Does the llama code that dropped leverage the GPU at all? On an M1 it appears to just run on as many CPU cores as you want to throw at it. The 65B heats up 8 cores real nicely, and it's slow, but I imagine it would be a lot faster on the GPU.

0 comments

Tostino3y ago

I've seen people saying that limiting it to 4 cores out of the 8 total can actually lead to improved performance. Have you seen that?

noduermeOP3y ago

8 starts and runs a bit faster for me if plugged in and before the fan kicks on and the CPU starts throttling. Once that happens it's probably better to stick with 4.

brianjking3y ago

All of the llama implementations for Apple are CPU only afaik.

j / k navigate · click thread line to collapse