In the example case, without having access to the issue text (the evil data), how does the main LLM actually figure out what to do if the quarantined LLM can just answer with digits?
Sure it can discover that it's a request to update the documentation, but how does it get the information it needs to actually change the erroneous part of the documentation?