You're comparing LLMs to a hypothetical alternative where a human reviews all 30k documents in detail. But the real alternative is often just a worse quality sieve where more errors blunder their way through the existing flawed processes. LLMs can improve on that.
You're right, I am comparing it to that alternative. There are fields and applications where this is necessary. I do not know if drilling reports are one of them. If you can tolerate a large false negative rate then great. But if you need to be catching 99.99% of problems then IMO you should at least be able to show your work. Taking black box output and throwing it over the wall sounds so sketchy in engineering contexts.
So if my ass was on the line for the output of an AI-written program being correct for 30k cases of parsing unstructured or mixed data I would be extremely careful. That is my point.