Aside from what the sibling says about the difference between the test harness code and the code being benchmarked, there's a more abstract point: you want the compiler to reduce it to a compile time constant if and only if in the real world cases you're trying to model, it will be able to do so. That's pretty rare, since if that happens, you probably wouldn't have to do performance analysis on that code.
These days I find myself telling people that benchmark numbers don’t matter on their own. It’s important what models you derive from those numbers. Refined performance models are by far the noblest and greatest achievement one could get with the benchmarking — it contributes to understanding how computers, runtimes, libraries, and user code work together. --Aleksey Shipilёv https://shipilev.net/blog/2014/nanotrusting-nanotime/
That's a bit of an obscure comment, but I keep coming back to it as I learn about performance work and benchmarking.