undefined | Better HN

0 pointscjbprime2y ago0 comments

Wouldn't expect that to work at all.

0 comments

Ollama (which wraps llama.cpp) supports splitting a model across devices so you get some acceleration even on models too big to fit entirely in GPU memory.

j / k navigate · click thread line to collapse

0 comments

hedgehog2y ago

Ollama (which wraps llama.cpp) supports splitting a model across devices so you get some acceleration even on models too big to fit entirely in GPU memory.

j / k navigate · click thread line to collapse