undefined | Better HN

0 pointsviraptor7mo ago0 comments

> we obtain the first statistically significant conclusion on current LLM capabilities for research ideation: we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility.

It's a bit better than just finding related pairs. And that's with sonnet 3.5 which is basically ancient at this point.

0 comments

tovej7mo ago

This paper centers "novelty" but also finds that human ideas are more feasible, and that LLM-generated ideas are not diverse and that LLMs cannot reliably evaluate ideas. None of the ideas were actually evaluated by performing experiments either.

Pretty much what I would expect. The paper also seems to be doing exactly what I described, I don't understand how the technique is better than that?

j / k navigate · click thread line to collapse

0 pointsviraptor7mo ago0 comments

https://arxiv.org/abs/2409.04109

It's a bit better than just finding related pairs. And that's with sonnet 3.5 which is basically ancient at this point.

0 comments

tovej7mo ago

Pretty much what I would expect. The paper also seems to be doing exactly what I described, I don't understand how the technique is better than that?

j / k navigate · click thread line to collapse