undefined | Better HN

0 pointsToValueFunfetti1y ago0 comments

>The problem is that you don't know which 5% are wrong

This is not a problem in my unreliable calculator use-cases; are you disputing that or dropping the analogy?

Because I'd love to drop the analogy. You mention IDEs- I routinely use IntelliJ's tab completion, despite it being wrong >>5% of the time. I have to manually verify every suggestion. Sometimes I use it and then edit the final term of a nested object access. Sometimes I use the completion by mistake, clean up with backspace instead of undo, and wind up submitting a PR that adds an unused dependency. I consider it indispensable to my flow anyway. Maybe others turn this off?

You mention hospitals. Hospitals run loads of expensive tests every day with a greater than 5% false positive and false negative rate. Sometimes these results mean a benign patient undergoes invasive further testing. Sometimes a patient with cancer gets told they're fine and sent home. Hospitals continue to run these tests, presumably because having a 20x increase in specificity is helpful to doctors, even if it's unreliable. Or maybe they're just trying to get more money out of us?

Since we're talking LLMs again, it's worth noting that 95% is an underestimate of my hit rate. 4o writes code that works more reliably than my coworker does, and it writes more readable code 100% of the time. My coworker is net positive for the team. His 2% mistake rate is not enough to counter the advantage of having someone there to do the work.

An LLM with a 100% hit rate would be phenomenal. It would save my company my entire salary. A 99% one is way worse; they still have to pay me to use it. But I find a use for the 99% LLM more-or-less every day.

0 comments

gitremote1y ago

> This is not a problem in my unreliable calculator use-cases; are you disputing that or dropping the analogy?

If you use an unreliable calculator to sum a list of numbers, you then need to use a reliable method to sum the numbers to validate that the unreliable calculator's sum is correct or incorrect.

ToValueFunfettiOP1y ago

Yes, so in my first example in the GP, this happens first. Humans do the work. The calculator double checks and gives me a list of all errors plus 5% of the non-errors, and I only need to double check that list.

In my third example, the calculator does the hard work of dividing, and humans can validate by the simpler task of multiplication, only having to do extra work 5% of the time.

(In my second, the unreliablity is a trade-off against speed, and we need the speed more.)

In all cases, we benefit from the unreliable tool despite not knowing when it is unreliable.

gitremote1y ago

In your first example, you appear to assume that for calculations where "each mistake could cost $millions or lives", engineers who calculated by hand typically didn't double-check by redoing the calculation, so a second check with a 95% accuracy tool is better than nothing. This assumption is false. I suggest you watch the 2016 film Hidden Figures to understand the level of safety at NASA when calculations were done by hand. You are suggesting lowering safety standards, not increasing them.

Your third example is unclear. No calculators can perform factoring of large numbers, because that is the expected ability of future quantum computers that can break RSA encryption. It is also unclear why multiplication and division have different difficulties, when dividing by n is equal to multiplying by 1/n.

2 more replies

ModernMech1y ago

I'd like to second the point made to you in this thread that went without reply: https://news.ycombinator.com/item?id=43702895

It's true that we use tools with uncertainty all the time, in many domains. But crucially that uncertainty is carefully modeled and accounted for.

For example, robots use sensors to make sense of the world around them. These sensors are not 100% accurate, and therefore if the robots rely on these sensors to be correct, they will fail.

So roboticists characterize and calibrate sensors. They attempt to understand how and why they fail, and under what conditions. Then they attempt to cover blind spots by using orthogonal sensing methods. Then they fuse these desperate data into a single belief of the robot's state, which include an estimate of its posterior uncertainty. Accounting for this uncertainty in this way is what keeps planes in the sky, boats afloat, and driverless cars on course.

With LLMs It seems like we are happy to just throw out all this uncertainty modeling and to leave it up to chance. To draw an analogy to robotics, what we should be doing is taking the output from many LLMs, characterizing how wrong they are, and fusing them into a final result, which is provided to the user with a level of confidence attached. Now that is something I can use in an engineering pipeline. That is something that can be used as a foundation to something bigger.

1 more reply

j / k navigate · click thread line to collapse

0 comments

gitremote1y ago

> This is not a problem in my unreliable calculator use-cases; are you disputing that or dropping the analogy?

If you use an unreliable calculator to sum a list of numbers, you then need to use a reliable method to sum the numbers to validate that the unreliable calculator's sum is correct or incorrect.

ToValueFunfettiOP1y ago

In my third example, the calculator does the hard work of dividing, and humans can validate by the simpler task of multiplication, only having to do extra work 5% of the time.

(In my second, the unreliablity is a trade-off against speed, and we need the speed more.)

In all cases, we benefit from the unreliable tool despite not knowing when it is unreliable.

gitremote1y ago

2 more replies

ModernMech1y ago

I'd like to second the point made to you in this thread that went without reply: https://news.ycombinator.com/item?id=43702895

It's true that we use tools with uncertainty all the time, in many domains. But crucially that uncertainty is carefully modeled and accounted for.

For example, robots use sensors to make sense of the world around them. These sensors are not 100% accurate, and therefore if the robots rely on these sensors to be correct, they will fail.

1 more reply

j / k navigate · click thread line to collapse