Self-Hosting GPU-Accelerated LLM (Mistral 7B) on Kubernetes (EKS) (opens in new tab)

(medium.com)

8 pointsnull_point2y ago3 comments

3 comments

Guide based on my experience running Mistral 7B LLM on EKS. Warning, some opinionated tech choices: AWS, EKS, Karpenter, NVIDIA GPUs, Hugging Face.

If you try this, be sure not to forget the GPU nodes sitting idle!

seo-speedwagon2y ago

Hey thanks for this article. It covered a lot of things I had to figure out myself, at great pain and over the course of many weeks, to get a similar-ish workload running on EKS. It also has some ideas that I hadn’t really thought about, especially around using PVs to amortize the cost of model fetching across pod lifecycles. I like that an awful lot.

null_pointOP2y ago

Thanks for the feedback. Glad you got something out of it.

> covered a lot of things I had to figure out myself, at great pain

My starting point for this was from Hugging Face docs, which don't really offer much for how to deploy to a k8s environment. Even the fact that you need GPUs for the model I was trying to run was not immediately apparent to me from the Mistral 7B HF docs (I'm sure this can vary a lot for different models).

> PVs to amortize the cost of model fetching across pod lifecycles

I'd love to pull more on that thread and figure out how to build a production quality inference service.

j / k navigate · click thread line to collapse

3 comments

null_pointOP2y ago

Guide based on my experience running Mistral 7B LLM on EKS. Warning, some opinionated tech choices: AWS, EKS, Karpenter, NVIDIA GPUs, Hugging Face.

If you try this, be sure not to forget the GPU nodes sitting idle!

seo-speedwagon2y ago

null_pointOP2y ago

Thanks for the feedback. Glad you got something out of it.

> covered a lot of things I had to figure out myself, at great pain

> PVs to amortize the cost of model fetching across pod lifecycles

I'd love to pull more on that thread and figure out how to build a production quality inference service.

j / k navigate · click thread line to collapse