You said:
> it's all very well to say "expect to lose an AZ" but during this outage it's not been physically possible to remove the broken AZ instances from multi-AZ services because we cannot physically get them to respond to or acknowledge commands
"Expect to lose an AZ" includes not being able to make any changes to existing instances in the affected AZ.
If you had instances across multiple AZs behind an ELB with health checks, then the ELB should automatically remove the affected instances.
If you have a different architecture, you would want to:
* Have another mechanism that automatically stops sending traffic to impaired instances (ideal), or
* Have a means to manually remove the instances from service without being able to interact with or modify those instances in any way
Does that help, or have I misunderstood your problem?