Recently, it got better (though maybe not perfect yet) at calculating how many layers of any model will fit onto the GPU, letting you get the best performance without a bunch of tedious trial and error.
Similar to Dockerfiles, ollama offers Modelfiles that you can use to tweak the existing library of models (the parameters and such), or import gguf files directly if you find a model that isn’t in the library.
Ollama is the best way I’ve found to use LLMs locally. I’m not sure how well it would fare for multiuser scenarios, but there are probably better model servers for that anyways.
Running “make” on llama.cpp is really only the first step. It’s not comparable.