This reminds me of a comment elsewhere I also replied to today: it's sort of hard to even pretend I have global usage stats, so I won't.
There's a certain type of myopia that leads to overindexing on llama.cpp that makes it easy to classify.
to wit:
> not aware of a competing format for quantized models
ONNX, that's how its done in prod and on other models besides (and including) LLaMa. Quantization is a general technique. 100 small variants of llama2 GGML weights feels like spam from that perspective. (sort of civitai vs. huggingface, hugginface smartly stopped that with AI art).
llm.mlc.ai for a more academic / less ad-hoc approach.
> [stars on github]
It's great for a very narrow & simple case that matches a large demographic on Github, and the demographics of people talking LLMs casually on HN: MacBook, wanna run locally and dream of a future free of having to ship your data to servers to get personalization. 5% of overall usage can be #2 in usage, if that makes sense.