Maybe a cross-language library that takes a binary weights file (with embedded model information) and exposes an interface similar to that of the web API? Or a local lightweight version of Nyckel that one can run on their own infrastructure (that exposes the same REST API)?
Just spitballing here; these two would be the most convinent for the use-cases I have in mind.