Ask HN: How do you deploy and scale ML (±DL) models?
1. Why is there so little "unbiased" info about production deploying/serving ML models? (I mean except the official docs of frameworks like eg. TensorFlow which obviously suggest their mothership's own services/solutions.)
2. Do you hand code microservices around your TF or Pytorch (or sklearn / homebrewed / "shallow" learning) models?
3. Do you use TensorFlow Serving? (If so, is this working fine for you with Pytorch models too?)
4. Is using Go infra like eg. Cortex framework common? (Keep reading about it, love the point and I'd love using static language here but not Java, but talked with noooone who's actually used it.)
5. And going beyond the basics: is there any good established recipe for deploying and scaling models with dynamic re-training (eg. the user app expose something like a "retrain with params X + Y + Z" API action, callable in response to user actions - eg. the user control training too) that does not break horribly with more than tens of users?
P.S. Links to any collections of "established best practices" or "playbooks" would be awesome!