I don't know the merit to what parent is saying, but it does make some intuitive sense if you think about it. As the context fills up, the LLM places less attention on further and further back in the context, that's why the LLM seems dumber and dumber as a conversation goes on. If you put 5 instructions in the system prompt or initial message, where one acts as a canary, then you can easier start to see when exactly it stops following the instructions.
Personally, I always go for one-shot answer, and if it gets it wrong or misunderstands, restart from the beginning. If it doesn't get it right, I need to adjust the prompt and retry. Seems to me all current models do get a lot worse quickly, once there is some back and forth.
It absolutely is folk magic. I think it is more accurate to impugn your understanding than mine.
> I don't know the merit to what parent is saying, but it does make some intuitive sense if you think about it.
This is exactly what I mean by folk magic. Incantations based on vibes. One's intuition is notoriously inclined to agree with one's own conclusions.
> If you put 5 instructions in the system prompt or initial message, where one acts as a canary, then you can easier start to see when exactly it stops following the instructions.
This doesn't really make much sense.
First of all, system prompts and things like agent.md never leave the context regardless of the length of the session, so the canary has absolutely zero meaning in this situation, making any judgements based on its disappearance totally misguided and simply a case of seeing what you want to see.
Further, even if it did leave the context, that doesn't then demonstrate that the model is "not paying attention". Presumably whatever is in the context is relevant to the task, so if your definition of "paying attention" is "it exists in the context" it's actually paying better attention once it has replaced the canary with relevant information.
Finally, this reasoning relies on the misguided idea that because the model produces an output that doesn't correspond to an instruction, it means that the instruction has escaped the context, rather than just being a sequence where the model does the wrong thing, which is a regular occurrence even in short sessions that are obviously within the context.
You're focusing on the wrong thing, ironically. Even if things are in the context, attention is what matters, and the intuition isn't about if that thing is included in the context or not, as you say, it'll always will be. It's about if the model will pay attention to it, in the Transformers sense, which it doesn't always do.
So, true creativity, basically? lol
I mean, the reason why programming is called a “craft” is because it is most definitely NOT a purely mechanistic mental process.
But perhaps you still harbor that notion.
Ah, I suddenly realized why half of all developers hate AI-assisted coding (I am in the other half). I was a Psych major, so code was always more “writing” than “gears” to me… It was ALWAYS “magic.” The only job where literally writing down words in a certain way produces machines that eliminate human labor. What better definition of magic is there, actually?
I’ll never forget the programmer _why. That guy’s Ruby code was 100% art and “vibes.” And yet it worked… Brilliantly.
Does relying on “vibes” too heavily produce poor engineering? Absolutely. But one can be poetic while staying cognizant of the haiku restrictions… O-notation, untested code, unvalidated tests, type conflicts, runtime errors, fallthrough logic, bandwidth/memory/IO costs.
Determinism. That’s what you’re mad about, I’m thinking. And I completely get you there- how can I consider a “flagging test” to be an all-hands-on-deck affair while praising code output from a nondeterministic machine running off arbitrary prompt words that we don’t, and can’t, even know whether they are optimal?
Perhaps because humans are also nondeterministic, and yet we somehow manage to still produce working code… Mostly. ;)
Folk magic is (IMO) a necessary step in our understanding of these new.. magical.. tools.
This is not entirely true. They pay the most attention to the things that are the earliest in history and the most recent in it, while the middle between the two is where the dip is. Which basically means that the system prompt (which is always on top) is always going to have attention. Or, perhaps, it would be more accurate to say that because they are trained to follow the system prompt - which comes first - that's what they do.