> On Windows, that usually means you need to open up the MSVC x64 native command prompt and run llamafile there, for the first invocation, so it can build a DLL with native GPU support. After that, $CUDA_PATH/bin still usually needs to be on the $PATH so the GGML DLL can find its other CUDA dependencies.
Yeah, I think the setup lost most users there.
A separate model/app approach (like Koboldcpp) seems way easier TBH.
Also, GPU support is assumed to be CUDA or Metal.