I'm talking about running week or month long tests with control and multi test cells containing new functionality, configuration, or code to determine the viability of a single or combination of changes by analyzing statistical output driven by p-value and pre-determined target metrics.
These types of experiments are extremely valuable in uncovering hard-to-find bugs, assuming you have sufficient logging and confidence around your metrics. They let you know a problem exists and roughly where it is in the product. From there you can drill down and investigate your source code until the discrepancy is found.