undefined | Better HN

0 pointsdekhn2y ago0 comments

Right. So if that exists, why would I want to embed my weights in the binary rather than distributing them as a side file?

I assume the answers are "because Justine can" and "sometimes it's easier to distribute a single file than two".

0 comments

simonw2y ago

Personally I really like the single file approach.

If the weights are 4GB, and the binary code needed to actually execute them is 4.5MB, then the size of the executable part is a rounding error - I don't see any reason NOT to bundle that with the model.

dekhnOP2y ago

I guess in every world I've worked in, deployment involved deploying a small executable which would run millions of times on thousands of servers, each instance loading a different model (or models) over its lifetime, and the weights are stored in a large, fast filesystem with much higher aggregate bandwidth than a typical local storage device. The executable itself doesn't even contain the final model- just a description of the model which is compiled only after the executable starts (so the compilation has all the runtime info on the machine it will run on).

But, I think llama plus obese binaries must be targeting a very, very different community- one that doesn't build its own binaries, runs in any number of different locations, and focuses on getting the model to run with the least friction.

csdvrx2y ago

> a large, fast filesystem with much higher aggregate bandwidth than a typical local storage device

that assumption gets wrong very fast with nvme storage, even before you add herding effects

1 more reply

fullspectrumdev2y ago

> But, I think llama plus obese binaries must be targeting a very, very different community- one that doesn't build its own binaries, runs in any number of different locations, and focuses on getting the model to run with the least friction.

Yes, the average user.

quickthrower22y ago

This is convenient for people who don't want to go knee deep in LLM-ology to try an LLM out on their computer. That said a single download that in turn downloads the weights for you is just as good in my book.

insanitybit2y ago

`ollama pull <modelname>` has worked for me, and then I can try out new models and updated the binary trivially.

j / k navigate · click thread line to collapse