Introducing Agents in Haystack: Make LLMs resolve complex tasks (opens in new tab)

(haystack.deepset.ai)

202 pointsvishesh923y ago102 comments

102 comments

If you (like me) were wondering how these works, the LLM is given a prompt like:

  Answer the following questions as best you can. You have access to the following tools:
  Search: Use this to search the internet.
  Calculator: Use this to do math.

  Use the following format:
  Question: the input question you must answer
  Thought: you should always think about what to do
  Action: the action to take, should be one of [{tool_names}]
  Action Input: the input to the action
  Observation: the result of the action
  ... (this Thought/Action/Action Input/Observation can repeat N times)
  Thought: I now know the final answer
  Final Answer: the final answer to the original input question

  Question: What is the age of the president of Egypt squared?
  Thought:

To which the LLM will generate a completion like:

  Thought: I need to find the age of the president of Egypt.
  Action: Search
  Action Input: Age of president of Egypt
  Observation:

At which point, the code (langchain, haystack, etc) will parse out the requested tool (Search) and input (Age of president of Egypt), and then call the right tool or API, then append the output of that action into the prompt.

This all happens in a loop, at each step, the LLM is given the entire past prompt history, and given the opportunity to do a completion to choose the next tool and input to the next tool, after which the code parses those out, executes the tool, and repeats until the LLM decides it has the final answer and returns.

tectonic3y ago

I just released something like this embedded in a browser extension. Except the prompt includes a TypeScript interface that GPT4 is asked to follow. Works very well and reliably uses tools like Calculate, RequestDOM, etc.

https://github.com/cantino/browser-friend

ryanwaggoner3y ago

I did a manual version of this where I played a dispatch controller in a robot, relaying inputs and outputs from GPT4, which I told was the reasoning brain in this robot. It was very remarkable to watch its train of thought in considering sensor inputs and then giving me actions to take in response.

sr-latch3y ago

This looks similar to the WebGPT paper, is that referenced in any of langchain or haystack's publications?

Introducing the mechanism of internal thought is very interesting, I wonder if there's a way to make it implicit in the model's architecture.

Sol-3y ago

I think the ReAct paper also popularized this approach: https://arxiv.org/abs/2210.03629

2 more replies

Ozzie_osman3y ago

My understanding is that the patterns are similar (in that you're enabling an LLM to use external tools/information), and all those patterns would fall under the "agents" pattern.

But, I think the difference is that WebGPT was actually fine-tuned / retrained for its specific use case, while the agents in these libraries just use the generic model without fine-tuning. My guess (and I'm not an expert here) is that fine-tuning these models for specific agent uses cases would probably result in better outcomes... Though as the models get more powerful, they might just perform well enough out of the box. (Also, some of the most recent OpenAI models don't support fine-tuning, and even for the ones that do, you'd need to generate the data to fine tune).

babyshake3y ago

Is ChatGPT plugins using something comparable to this under the hood?

ericlewis3y ago

Yes and no. Whatever they are doing seems more robust than anything else I have tried. Especially with being able to bring context in the conversation to later invocations of tools. I haven't managed to get langchain to do that well.

1 more reply

ElFitz3y ago

I had some fun with a similar approach, but when generating large outputs, or retrieving large contexts, it can easily run into the context window limit.

I think this could be partially solved by intelligently summarising parts of the prompt history, while storing the original in some vector db, so the relevant parts can be retrieved at will.

Quite fun.

gremlinsinc3y ago

Does observation cover reflexion? Self-observation or is that something else?

Or maybe before Final Answer, you could do: Double Check: I think I have the final answer, but does it look right? IF yes: go to final answer if no: Go back up the loop.

stevenhuang3y ago

For those interested in an explainer in reflexion (asking the LLM if it made an error and allowing it to correct itself) I found this breakdown useful: https://youtu.be/5SgJKZLBrmg

> GPT 4 can self-correct and improve itself. With exclusive discussions with the lead author of the Reflexions paper, show how significant this will be across a variety of tasks, and how you can benefit.

Ozzie_osman3y ago

(also tbc I took these example prompts from LangChain.. not sure if Haystack uses different prompts (LangChain actually has a bunch of versions, this is probably the easiest one)

yosito3y ago

I haven't yet figured out how to get an LLM to accurately determine whether it actually knows something or is making it up. I wonder how they handle that. They may get to that at some point in the article, but the page eventually breaks for me on mobile and I can't read past the first code block.

dimatura3y ago

I eventually ended up switching fields to ML (and that's my current day job), but I started out as an undergrad studying psychology/cognitive science. During those years I started a research project on what is called the "feeling of knowing", or FOK for short - a subtopic in the broader topic of "metacognition" in cognitive science. The FOK is closely related to what is colloquially knows as tip-of-the-tongue phenomenon - basically, that subjective feeling that we know something, whether or not we can actually recall it. There's some interesting aspects to it. For example, the FOK is generally pretty accurate (but not perfect, of course). And it tends to be more robust than actual memory; as we age, for example, we tend to be better at judging than we know an actor's name (and confirming it once we look it up) than recalling it. It seems like LLMs have very little in the way of metacognition, and just confabulate if they don't know something, as we've seen. I'm sure we'll be seeing some efforts to give some analog of a FOK to LLMs in the near future.

est313y ago

FTR, people are trying to build systems to compare LLMs with each other based on how well they are at saying "I don't know" (of course knowing is still rewarded higher): https://github.com/manyoso/haltt4llm

sr-latch3y ago

Would be cool to try to incorporate the previous token's confidence embedding into this process, but that would make training with a triangular attention mask not possible.