>Right now, most LLMs with web search grounding are still in Stage 1: they can retrieve content, but their ability to assess quality, trustworthiness, and semantic ranking is still very limited.
Why do you think it is limited? Imagine you show a link with details to an LLM and ask it if it is trustworthy or high quality w.r.t the query, why can't it answer it?