Haystack Agents are designed in a way so that you can easily use them with different LLM providers. You just need to implement one standardized wrapper class for your modelprovider of choice (
https://github.com/deepset-ai/haystack/blob/7c5f9313ff5eedf2...)
So back to your question: We will enable both ways in Haystack: 1) Loading a local model directly via Haystack AND 2) quering self-hosted models via REST (e.g. Huggingface running on AWS SageMaker). Our philosophy here: The model provider should be independent from your application logic and easy to switch.
In the current version, we support for local models only option 1. This works for many of the provided models provided by HuggingFace, e.g. flan-t5. We are already working on adding support for more open-source models (e.g. alpaca) as models like Flan-T5 don't perform great when used in Agents. The support for sagemaker endpoints is also on our list. Any options you'd like to see here?