I think you didn't read my proposed methodology carefully.
Box Researcher | Owner of phone (outside room)
[ ]~ x | o
The researchers speaks about some subjects, but not others. The ~ represents that in some cases the researcher's conversations are being fed into the box, and in others they are not. The box is otherwise soundproof, and inside is the phone being tested.
We can pick subjects such as:
- #1 Adult incontinence
- #2 Cat food
- #3 Last-minute trip
- #4 ..
- ..
- #10
The test group is that the researchers' voices are being fed into the box. The control group is that researchers voices are NOT being fed into the box.
It is important in order to maintain double-blind environment that the researchers not hear whether they are being amplified into the box.
The results might potentially look like this:
https://imgur.com/a/y7852
Of course, I just made this up. (I imagine the subjective 1-5 scores being whether the given subject reports seeing such an advertisement, from 0 definitely not to 5 definitely yes.) I even made subject 3 unsure about topics 1 and 3 to mimic that humans are fallible. Likewise subject 2 does not really report any advertisements. (This is likely in the real world - for example subject 2 could be explicitly excluded by advertisers for some reason.)
The attached is the kind of graphs that I would expect based on dozens of scientifically-minded people trying them.
If these are the two graphs that we got, and if the test and control groups were truly randomized, what other explanation could you offer?
Of course, my proposed experiment is orders of magnitude more scientific than what people are doing with their n=1, unblinded personal experiments. But theirs has some validity also.