> Tool responses can be provided via messages with the `tool` role.
This sample code shows how a sample implementation of a tool like `get_current_weather` might look like in Python:
https://github.com/ollama/ollama-python/blob/main/examples/t...
Part of me was hoping for some magic plugin space where I could drop named functions, but I couldn't imagine how.
Agent Zero already can use Ollama and alternatives to run the LLMs, and this new feature should enable it to more accurately call tools that is getting built into the models that support it.
- Is the environment that the tools are run from sandboxed?
I’m unclear on when/how/why you’d want an LLM executing code on your machine or in a non-sandboxed environment.
Anyone care to enlighten?
IMO Tool is a bad word for the majority of the use cases ("calculator", "weather API"). It's more like giving the LLM an old school calculator + a constrained data retriever.
Because you or somebody you entrust knows every line of code in the functions ultimately called at a high-ish level, you can do it, and know it is only really receiving data, not taking arbitrary action.
Now letting it rampantly run a python process arbitrarily etc, that'd be different, I suppose that fits in. But I think this is largely NOT how people are using tools since if you do that, how do you ever usefully know how to get the output of running it and apply that output?
E.g. if I were making an AI that could suggest restaurants, I could just say "find a Mexican restaurant that makes Horchata", have it translate that to a tool call to get a list of restaurants and their menus, and then run inference on that list.
I also tinkered with a Magic: The Gathering AI that used tool calling to get the text and rulings for cards so that I could ask it rules questions (it worked poorly). It saves the user from having to remember some kind of markup to denote card names so I can pre-process the query.
It is up to the person who implements the tool to sandbox it as appropriate.
> I’m unclear on when/how/why you’d want an LLM executing code on your machine or in a non-sandboxed environment.
The LLM does not execute code on your computer. It returns the fact that the LLM would like to execute a tool with certain parameters. You should trust those parameters as much as you trust the prompt and the LLM itself. Which in practice probably ends up being "not much".
Good news is that in your tool implementation you can (and should) apply all the appropriate checks using regular coding practices. This is nothing new. We do this all the time with web requests. You can check if the prompt originates from an authenticated user, if they have the necessary permissions to do the action they are about to do. You can throttle the requests, you can check that the inputs are appropriate etc etc.
If the tool is side-effect free and there are no access restrictions you can just run it easy. For example imagine an LLM which can turn the household name of a plant to its latin name. You would have a "look_up_latin_name" tool which searches in a local database. You have to make sure to follow best practices to avoid an sql injection attack, but otherwise this should be easy.
Now imagine a more sensitive situation. A tool with difficult to undo side-effects, and strict access controls. For example launching an ICBM attack. You would create a "launch_nukes" tool, but the tool wouldn't just launch willy nilly. First of all it would check that the prompt arrived from directly the president. (how you do that is best discussed with your NSA rep in person) Then it would check that the parameter is one of the valid targets. But that is not enough yet. You want to make sure it is not the LLM hallucinating the action. So you would pop up a prompt directly on the UI to confirm the action. Something like "Looks like you want to destroy <target>. Do you want to proceed? <yes> <no>" And would only launch when the president clicks the yes.
You COULD do dangerous things, but it's not like the LLM is constructing code it runs on its own.
Is this to the effect of running a local LLM, that reads your prompt and then decides which correct/specialized LLM to hand it off to? If that is the case, isn’t it going to be a lot of latency to switch models back and forth as most people usually run the single largest model that will fit on their GPU?
The reason this is cool is because it allows you to integrate with things like Home Assistant, so you can ask your chat bot or whatever to actual take actions. "Hey bot, turn on the lights in the basement" as an example.
https://openai.com/index/function-calling-and-other-api-upda...
You wouldn't make this complaint against a JS framework blogposting about their new MVC features.
As an aside its actually incredible that these days we idly accuse people of being actual autists just because they didn't condescend to our level first.
Really unnecessary and distasteful to speculate like this. Just ask your question if you don't understand something.
> Is this to the effect of running a local LLM,
Yes. That is what ollama does.
> that reads your prompt
Yes.
> and then decides which correct/specialized LLM to hand it off to?
No. It does not hand it to a correct/specialized LLM. (or in general that is not the interesting use case) It hands it off to a traditionally coded program. Something written without any AI in it. This traditionally coded program does some job for the AI agent and then returns a result to it. The AI agent using that can use the result to answer the prompt.
Imagine as an example a calculator. Imagine if you want the AI to answer the following prompt: "How much will I have to pay if I bought an apple $2 and two bananas ($3 each) and the sales tax is 3%?"
To answer that question the LLM has to perform three steps: 1; understand that the above text stands for (2+23)1.03 and 2; perform the arithmetic correctly. 3; format the answer in an appropriate way (For example "You will have to pay 8 dollars and a quarter for tax.")
You can try to train an LLM which does all 3 steps internally. It parses the input and outputs the output. But in general you will have a lot of trouble with that approach.
So instead that you train the LLM to parse the prompt and output something like "<calculator (2+23)1.03>" Then your UI intercepts this output from the LLM and recognises that it is asking for a tool to be used. In this case it is trying to use the "calculator" tool with the parameter "(2+23)1.03". So your UI doesn't display anything to the user but passes the "(2+23)1.03" to a traditionally coded binary/script. That script calculates the result using normally programmed logic. Then the UI prompts the LLM again this time the prompt contains the initial prompt text, the LLM's call for the tool, and the output of the tool. Now the LLM can just see the right response in front of it, and using the full context of the original prompt formats an answer.
What can a tool do? Anything really. It can open the pod bay door. It can reach out to a database. It can use a geo api to plan a route between two cities. It can read a wikipedia entry. It can write to a knowledge base. It can activate a nuclear bomb. Whatever is appropriate in your use case.