Thanks for the feedback. Glad you got something out of it.
> covered a lot of things I had to figure out myself, at great pain
My starting point for this was from Hugging Face docs, which don't really offer much for how to deploy to a k8s environment. Even the fact that you need GPUs for the model I was trying to run was not immediately apparent to me from the Mistral 7B HF docs (I'm sure this can vary a lot for different models).
> PVs to amortize the cost of model fetching across pod lifecycles
I'd love to pull more on that thread and figure out how to build a production quality inference service.