undefined | Better HN

0 pointsthelastparadise2y ago0 comments

So if you share a binary with a friend you'd have to have them install cuda toolkit too?

Seems like a dealbreaker for the whole idea.

0 comments

> On Windows, that usually means you need to open up the MSVC x64 native command prompt and run llamafile there, for the first invocation, so it can build a DLL with native GPU support. After that, $CUDA_PATH/bin still usually needs to be on the $PATH so the GGML DLL can find its other CUDA dependencies.

Yeah, I think the setup lost most users there.

A separate model/app approach (like Koboldcpp) seems way easier TBH.

Also, GPU support is assumed to be CUDA or Metal.

jart2y ago

Author here. llamafile will work on stock Windows installs using CPU inference. No CUDA or MSVC or DLLs are required! The dev tools are only required to be installed, right now, if you want get faster GPU performance.

vsnf2y ago

My attempt to run it with the my VS 2022 dev console and a newly downloaded CUDA installation ended in flames as the compilation stopped with "error limit reached", followed by it defaulting to a CPU run.

It does run on the CPU though, so at least that's pretty cool.

1 more reply

abareplace2y ago

The CPU usage is around 30% when idle (not handling any HTTP requests) under Windows, so you won't want to keep this app running in background. Otherwise, it's a nice try.

fragmede2y ago

I'm sure doing better by windows users is on the roadmap, exec then reexec to get into the right runtime, but it's a good first step towards making things easy.

j / k navigate · click thread line to collapse