undefined | Better HN

0 pointsgodelski9mo ago0 comments

  > You're here using "ground truth" in some kind of grand epistemic sense

I used the word "ground truth" because you did!

  >> in agent loops with access to ground truth about whether things compile and pass automatic acceptance.

Your critique about "my usage of ground truth" is the same critique I'm giving you about it! You really are doing a good job at making me feel like I'm going nuts...

  > the information an LLM natively operates with,

And do you actually know what this is?

I am a ML researcher you know. And one of those ones that keeps saying "you should learn the math." There's a reason for this, because it is really connected to what you're talking about here. They are opaque, but they sure aren't black boxes.

And it really sounds like you're thinking the "thinking" tokens are remotely representative of the internal processing. You're a daily HN user, I'm pretty sure you saw this one[0].

I'm not saying anything OpenAI hasn't[1]. I just recognize that this applies to more than a very specific narrow case...

[0] https://news.ycombinator.com/item?id=44074111

[1] https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def563...

0 comments

tptacek9mo ago

Right, I'm just saying, I meant something else by the term than you did. Again: my point is, the math of the LLM doesn't matter to the point I'm making. It's not the model figuring out whether the code actually compiled. It's 200 lines of almost straight-line Python code that has cracked the elusive computer science problem of running an executable and checking the exit code.

godelskiOP9mo ago

  > the math of the LLM doesn't matter to the point I'm making.

The point I'm making is that to make effective use out of a tool, you should know what the tool can and can't do. Really the "all models are wrong, but some models are useful" paradigm. To know which models are useful you have to know how your models are wrong.

Sure, you can blindly trust too. But that can get pretty dangerous. While most of the time we leverage high levels of trust, I'm unconvinced our models allow us to trust them. Without being able to strongly demonstrate that they do not optimize tricking us (in our domains of interest) then they should be treated as distrustful, not trustful.

tptacek9mo ago

The part of the tool that I'm "blindly trusting" is the part any competent programmer can reason about.

j / k navigate · click thread line to collapse

0 pointsgodelski9mo ago0 comments

  > You're here using "ground truth" in some kind of grand epistemic sense

I used the word "ground truth" because you did!

  >> in agent loops with access to ground truth about whether things compile and pass automatic acceptance.

Your critique about "my usage of ground truth" is the same critique I'm giving you about it! You really are doing a good job at making me feel like I'm going nuts...

  > the information an LLM natively operates with,

And do you actually know what this is?

And it really sounds like you're thinking the "thinking" tokens are remotely representative of the internal processing. You're a daily HN user, I'm pretty sure you saw this one[0].

I'm not saying anything OpenAI hasn't[1]. I just recognize that this applies to more than a very specific narrow case...

[0] https://news.ycombinator.com/item?id=44074111

[1] https://cdn.openai.com/pdf/34f2ada6-870f-4c26-9790-fd8def563...

0 comments

tptacek9mo ago

godelskiOP9mo ago

  > the math of the LLM doesn't matter to the point I'm making.

tptacek9mo ago

The part of the tool that I'm "blindly trusting" is the part any competent programmer can reason about.

j / k navigate · click thread line to collapse