What I’m seeing here is that you don’t have mature mechanisms to assure the reliability of your services yet. The second paragraph suggests that a misconfiguration was able to make it into production that arguably should have been caught at an earlier stage of the deployment pipeline. Anyone can make these sorts of mistakes; the fact that a particular colleague is more prone to them really doesn’t matter all that much.
Fortify your delivery pipeline and the problem should resolve itself.