undefined | Better HN

0 pointsnhaehnle12y ago0 comments

What you write is not quite wrong, but it must be taken with a spoon of salt due to the way this kind of science is usually done.

If you fix a binary hypothesis, and then run your experiment for that specific hypothesis and only that hypothesis, then you're right.

In practice, though, people look at data such as from such experiments, and then invent a hypothesis that fits. The space of potential plausible-sounding hypotheses is huge (especially since, in many cases in psychology and related fields, both A and not-A may sound plausible). So the chance that such a small sample appears to show evidence for some plausible-sounding but incorrect hypothesis is actually very high.

0 comments

wch12y ago

I agree that there is a general problem that researchers go on fishing expeditions with their data. But having a small sample size doesn't make it any more likely to find a false positive.

We've already agreed that a small sample size doesn't make it any more likely to find a false positive for a given hypothesis. This is true for H1, H2, H3, etc., where each of these is a hypothesis. Therefore the aggregate effect of testing N different hypotheses is that you're no more likely to find a false positive with a small sample size vs a large sample size. You are more likely to have false negatives with small samples, though.

loup-vaillant12y ago

> But having a small sample size doesn't make it any more likely to find a false positive.

It does. Try and test a die for load. Let's say your prior probability of the dice being loaded is 50%, because this is a real shady place you're gambling in. You further know (based on the game you're playing) that if your die is loaded, it will land with these frequencies:

  1: 1/3 of the time.
  2: 1/6 of the time.
  3: 1/6 of the time.
  4: 1/6 of the time.
  5: 1/6 of the time.
  6: almost never

Now, you will throw the die on the table a number of times to test it for load. Each throw will give you some evidence. If I've got my calculations correct, landing a 6 nearly guarantees the die isn't loaded, landing one gives you 1 bit of evidence that it's loaded, and landing anything else doesn't tell you anything.

Now what is the probability for false positive? Well… With only one throw, you will land 1 one times out of six, giving you a posterior probability distribution of 2/3 loaded, 1/3 genuine (this is as close as you will get to a false positive).

With 2 throws, it's a bit more complicated:

  1    , 1    :  1/36 : loaded with 80% probability
  1    , [2-5]:  8/36 : loaded with 67% probability
  6    , [1-6]: 11/36 : definitely genuine
  [2-5], [2-5]: 16/36 : no evidence

And so on, as you throw the die over and over again. I'll spare you the calculations, but the simple thing is, the die will get more and more chances to eventually land a 6, rendering the "definitely genuine" observation more and more probable (1 - (5/6)^number_of_throws), and the false positives less and less believable.

Okay, this is a contrived example. But sufficiently large sample sizes do indeed reduce the risk of false positives. It's just that some result are so clear cut that they don't need large sample sizes to reach a conclusion reliably.

wch12y ago

You're getting "false positive" with the method you've chosen, but it's not a method that would be accepted in a scientific paper as evidence for an experimental effect. Maybe your method is more appropriate for, say, a machine learning context, but it's not what would be used in a paper like this.

First, the statistical tests used for these experiments don't make use of Bayesian stats, so the prior 50%-loaded probability simply isn't factored in. The standard is to use null-hypothesis testing, which says roughly, that if the null hypothesis is true -- that is, if there is no actual difference between the populations (experimental groups A and B, for example) -- what is the probability that you'd see a pattern like the one observed in the data. And the tests take sample size into account in calculating this probability.

If you throw the die once, the test that you'd use here (Chi-square) would _never_ give you a false positive, that is a p-value of <.05. With small samples, there is too little power to get a the requisite p-value. (And I'll note that Chi-square is one of the tests used in these papers.)

There's a whole other debate about whether p-values and null hypothesis tests are the right thing to use, whether the standard 0.05 threshold p-value is small enough, whether Bayesian stats should be used, etc. These are legitimate issues. But they're separate from the claim that small samples will increase the likelihood of a false positive.

1 more reply

j / k navigate · click thread line to collapse

0 comments

wch12y ago

I agree that there is a general problem that researchers go on fishing expeditions with their data. But having a small sample size doesn't make it any more likely to find a false positive.

loup-vaillant12y ago

> But having a small sample size doesn't make it any more likely to find a false positive.

  1: 1/3 of the time.
  2: 1/6 of the time.
  3: 1/6 of the time.
  4: 1/6 of the time.
  5: 1/6 of the time.
  6: almost never

With 2 throws, it's a bit more complicated:

  1    , 1    :  1/36 : loaded with 80% probability
  1    , [2-5]:  8/36 : loaded with 67% probability
  6    , [1-6]: 11/36 : definitely genuine
  [2-5], [2-5]: 16/36 : no evidence

wch12y ago

1 more reply

j / k navigate · click thread line to collapse