Should we research all of them and try again with a big table of hits and misses from the first attempt? Seems like a lot of work.
Also: Is the generated response really "very good at matching the correct answer"? I suppose it would work because the search engine's language processing cancels out the useless parts that were generated by the AI (sort of a "human ABI", analogous to the C ABI?) but a more direct query (e.g. "height of everest") would likely be just as effective.