If I use spinnaker and chaos monkey in a prod scale (or even in a prod experiment) to create a circumstance where I can't perform a write of a resource because I couldn't achieve quorum in a replica set and that let to an inconsistency between two data stores... is that meaningfully different than observing the same issue but caused by incorrect vpc routing or insufficienty resourced test instances leading/slow node startup times/race conditions in a CI test environment?
I think there is overlap and that it does not have to be a choice between either approach.