undefined | Better HN

0 pointsbaq7mo ago0 comments

Read about the jagged frontier. IanCal is right: this is a perfect example of using the tool wrong; you’ve focused on a very narrow use case which is surprisingly hard for the matmuls to not mess up and extrapolate, but extrapolation is incorrect here because the capability frontier is fractal and not continuous.

0 comments

grey-area7mo ago

It’s not surprisingly hard at all, when you consider they have no understanding of the tasks they do nor of the subject material. It’s just a good example of the types of tasks (anything requiring reliability or correct results) that they are fundamentally unsuited to.

Sadly it seems the best use-case for LLMs at this point is bamboozling humans.

baqOP7mo ago

When you take a step back it's surprising that these tools can be actually useful at all in nontrivial tasks, but being surprised doesn't matter in the grand scheme of things. Bamboozling rarely enough for harnesses to keep them in line and ability to inference-time self-correct when bamboozling is detected either by the model itself or by the harness is very useful at least in my work. It's a question of using the tool correctly and understanding its limitations, which is hard if you aren't willing to explore the boundaries and commit to doing it every month basically.

j / k navigate · click thread line to collapse

0 comments

grey-area7mo ago

Sadly it seems the best use-case for LLMs at this point is bamboozling humans.

baqOP7mo ago

j / k navigate · click thread line to collapse