> we obtain the first statistically significant conclusion on current LLM capabilities for research ideation: we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility.
It's a bit better than just finding related pairs. And that's with sonnet 3.5 which is basically ancient at this point.