This sounds like a good theory but the practice of it is really hard. Pretty quickly you end up with tests that "say" one thing but have nuanced different behavior in the underlying implementation.
Then try to debug a "document"...
I like the idea. But having tried it at scale, it becomes a mess. Code I can understand. I can read English comments. I can't debug English.
I know what typical code does. This code looks simple but that's misleading when you're trying to understand a failure. You want consistency and clarity. You want readablity like code is readable not like a book is readable.