Your Browser Agent Is Thinking Too Hard (opens in new tab)

(100x.bot)

2 pointsshardullavekar6mo ago3 comments

3 comments

As someone who has tried almost all of the AI browsers that are accessible or in a relatively open beta, plus all the browser control frameworks and agents, I super agree with the notions behind this post.

Curious about your approach, though: so, it's a literal script, or an LLM being told to follow a deterministic script and only get subjective when necessary? Based on the blog, it looks like the former, but why not the later? Get the LLM to be pseudo-deterministic but still step-by-step it so that it can handle UI changes and adjacent interfaces.

shardullavekarOP6mo ago

A workflow can have subjective parts too. For example, click on button A if it satisfies certain conditions I wrote in plain English, otherwise click on B.

These subjective elements can be defined with user inputs/prompts.

So a workflow is a literal script with embedded LLM calls for branching or even scraping details where literal script feels tedious.

Sherveen6mo ago

Neat.

j / k navigate · click thread line to collapse

3 comments

Sherveen6mo ago

shardullavekarOP6mo ago

A workflow can have subjective parts too. For example, click on button A if it satisfies certain conditions I wrote in plain English, otherwise click on B.

These subjective elements can be defined with user inputs/prompts.

So a workflow is a literal script with embedded LLM calls for branching or even scraping details where literal script feels tedious.

Sherveen6mo ago

Neat.

j / k navigate · click thread line to collapse