Yeah, when the author was writing about that initial query about delay-per-unit-length, I'm thinking: "This doesn't tell us whether an LLM can apply the concepts, only whether relevant text was included in its training data."
It's a distinction I fear many people will have trouble keeping in-mind, faced with the misleading eloquence of LLM output.