undefined | Better HN

0 pointsynniv2y ago0 comments

Every time a new model hits I'm waiting for his ggmls

0 comments

ggml quantization is very easy with the official llama.cpp repo. Its quick and mostly dependency free, and you can pick the perfect size for your CPU/GPU pool.

But don't get me wrong, TheBloke is a hero.

int_19h2y ago

While we're at it, the GGML file format has been deprecated in favor of GGUF.

https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md

https://github.com/ggerganov/llama.cpp/pull/2398

ynnivOP2y ago

Some of the newer models have slightly different architectures, so he explains any differences and shows a llama.cpp invocation. Plus you can avoid pulling the larger dataset.

brucethemoose22y ago

Yeah. Keeping up wkth the changes is madness, and those FP16 weights are huge.

j / k navigate · click thread line to collapse

0 comments

brucethemoose22y ago

ggml quantization is very easy with the official llama.cpp repo. Its quick and mostly dependency free, and you can pick the perfect size for your CPU/GPU pool.

But don't get me wrong, TheBloke is a hero.

int_19h2y ago

While we're at it, the GGML file format has been deprecated in favor of GGUF.

https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md

https://github.com/ggerganov/llama.cpp/pull/2398

ynnivOP2y ago

Some of the newer models have slightly different architectures, so he explains any differences and shows a llama.cpp invocation. Plus you can avoid pulling the larger dataset.

brucethemoose22y ago

Yeah. Keeping up wkth the changes is madness, and those FP16 weights are huge.

j / k navigate · click thread line to collapse