>
But having a small sample size doesn't make it any more likely to find a false positive.It does. Try and test a die for load. Let's say your prior probability of the dice being loaded is 50%, because this is a real shady place you're gambling in. You further know (based on the game you're playing) that if your die is loaded, it will land with these frequencies:
1: 1/3 of the time.
2: 1/6 of the time.
3: 1/6 of the time.
4: 1/6 of the time.
5: 1/6 of the time.
6: almost never
Now, you will throw the die on the table a number of times to test it for load. Each throw will give you some evidence. If I've got my calculations correct, landing a 6 nearly guarantees the die isn't loaded, landing one gives you 1 bit of evidence that it's loaded, and landing anything else doesn't tell you anything.
Now what is the probability for false positive? Well… With only one throw, you will land 1 one times out of six, giving you a posterior probability distribution of 2/3 loaded, 1/3 genuine (this is as close as you will get to a false positive).
With 2 throws, it's a bit more complicated:
1 , 1 : 1/36 : loaded with 80% probability
1 , [2-5]: 8/36 : loaded with 67% probability
6 , [1-6]: 11/36 : definitely genuine
[2-5], [2-5]: 16/36 : no evidence
And so on, as you throw the die over and over again. I'll spare you the calculations, but the simple thing is, the die will get more and more chances to eventually land a 6, rendering the "definitely genuine" observation more and more probable (1 - (5/6)^number_of_throws), and the false positives less and less believable.
Okay, this is a contrived example. But sufficiently large sample sizes do indeed reduce the risk of false positives. It's just that some result are so clear cut that they don't need large sample sizes to reach a conclusion reliably.