The GRIM test – a method for evaluating published research (opens in new tab)

(medium.com)

79 pointsmaxharlow10y ago18 comments

18 comments

Another simple thing you can test in a paper to see if it is credible is p-curve and related methods from Uri Simonsohn et al.

http://www.p-curve.com/

You just look at the distribution of p values that are used to support the authors' hypotheses. If the distribution is skewed high, then something fishy is going on.

jldugger10y ago

Interesting approach -- get the scientific community to agree on the mathematical principles first, before anyone specifically is outed as cheating.

But this article feels like reading a teaser chapter of a bigger story.

> The amount of (toil) required to actually create data like this from scratch if (very) nightmarish. It’s a task drastically out of reach of the (foolish people) who’d try such a bush league stunt in the first place.

This assumes that all experiments lead to publications. We know there's a strong publication bias, and that the bias favors positive results, and dramatically favors unintuitive positive results. Which means you need to find correlations where none were expected. How many experiments do you need to get a significant correlation when there is none? Hint: more than one.

It's also worth noting that it wouldn't be difficult to produce a genetic algorithm using various statistical checks, including this one, as a fitness function.

moyix10y ago

Just to address your comment about publication bias – we do have techniques for detecting that across a set of studies on the same subject, namely funnel plots:

https://en.wikipedia.org/wiki/Funnel_plot

jldugger10y ago

Wasn't implying we don't. Just that the calculus of data fraud is weighted more towards fraud than the article suggests.

bayesian_horse10y ago

I don't agree with the idea that faking data is more difficult than running the experiment. A lot of courses in universities now teach R or something similar to run statistics. Relatively simple "monte carlo" simulations would provide results which satisfy the GRIM test.

michaelmior10y ago

While it's true that fractional ages are pretty much never used, ages do not necessarily have to be recorded as whole numbers.

LionessLover10y ago

Since few people will find it, the response on the web page is

https://medium.com/@jamesheathers/a-lot-of-people-are-hung-u...

> A lot of people are hung up the fact that it’s possible to collect ages in finer grains — to the nearest month, or day, etc.

> Remember a) this is just a hypothetical illustration and b) the vast majority of the time, age is collected exactly how I’ve described here.

moyix10y ago

A more practical problem with applying this on age data specifically is that everyone I looked at (granted, only a handful of CS papers) only gave one decimal place, not two.

_bdog10y ago

I work together with a psychology professor (was head of department) now and then. She said there's a large problem with students, even graduates, just not understanding statistics and math properly (or at all)

untilHellbanned10y ago

Same is true in biology

davidgerard10y ago

Medical researcher rediscovers integration, gets 75 citations https://fliptomato.wordpress.com/2007/03/19/medical-research...

bluenose6910y ago

Some of these might be simple errors, with results being typed in from other documents. Authors who are worried about making such errors might want to consider using methods of reproducible research, e.g. writing *blah had mean value `round(mean(x), 3)` (n=`length(x)`)" or similar in Sweave, where the items in the back-ticks are R code working on the actual data. This is a bit more work, but it prevents transcription errors, and also a pernicious type of error that comes about by adjusting the data analysis during the writing process.

bane10y ago

Also see: https://en.wikipedia.org/wiki/Benford%27s_law for a related test used in various fields.

moyix10y ago

Wow, this is a really clever technique, and the results are really alarming. I suspect we'll see some careers unravel as a result of it.

xiphias10y ago

Not really...there's a reason while psychology is not considered science by non-psychology scientists: it's not because the researches couldn't be done in a scientifically useful way, but because even the teachers don't care about math. I think Facebook and Google and other ad agencies have a better model about human behavior than psychologists (although they only have data on a specific part of human behavior)

progers710y ago

I've done a similar trick where the ratio of two secret integers is released publicly with many significant digits and you can sometimes find the two integers by brute forcing the division over all possible values. Does anyone know a name for this approach?

kurlberg10y ago

Brute force is not needed, check out "continued fraction" (in particular "11. Best rational approximations" at wikipedia.)

flerchin10y ago

Good. Fuck the liars and cheats right in their... careers.

j / k navigate · click thread line to collapse

18 comments

canjobear10y ago

Another simple thing you can test in a paper to see if it is credible is p-curve and related methods from Uri Simonsohn et al.

http://www.p-curve.com/

You just look at the distribution of p values that are used to support the authors' hypotheses. If the distribution is skewed high, then something fishy is going on.

jldugger10y ago

Interesting approach -- get the scientific community to agree on the mathematical principles first, before anyone specifically is outed as cheating.

But this article feels like reading a teaser chapter of a bigger story.

It's also worth noting that it wouldn't be difficult to produce a genetic algorithm using various statistical checks, including this one, as a fitness function.

moyix10y ago

Just to address your comment about publication bias – we do have techniques for detecting that across a set of studies on the same subject, namely funnel plots:

https://en.wikipedia.org/wiki/Funnel_plot

jldugger10y ago

Wasn't implying we don't. Just that the calculus of data fraud is weighted more towards fraud than the article suggests.

bayesian_horse10y ago

michaelmior10y ago

While it's true that fractional ages are pretty much never used, ages do not necessarily have to be recorded as whole numbers.

LionessLover10y ago

Since few people will find it, the response on the web page is

https://medium.com/@jamesheathers/a-lot-of-people-are-hung-u...

> A lot of people are hung up the fact that it’s possible to collect ages in finer grains — to the nearest month, or day, etc.

> Remember a) this is just a hypothetical illustration and b) the vast majority of the time, age is collected exactly how I’ve described here.

moyix10y ago

A more practical problem with applying this on age data specifically is that everyone I looked at (granted, only a handful of CS papers) only gave one decimal place, not two.

_bdog10y ago

untilHellbanned10y ago

Same is true in biology

davidgerard10y ago

Medical researcher rediscovers integration, gets 75 citations https://fliptomato.wordpress.com/2007/03/19/medical-research...

bluenose6910y ago

bane10y ago

Also see: https://en.wikipedia.org/wiki/Benford%27s_law for a related test used in various fields.

moyix10y ago

Wow, this is a really clever technique, and the results are really alarming. I suspect we'll see some careers unravel as a result of it.

xiphias10y ago

progers710y ago

kurlberg10y ago

Brute force is not needed, check out "continued fraction" (in particular "11. Best rational approximations" at wikipedia.)

flerchin10y ago

Good. Fuck the liars and cheats right in their... careers.

j / k navigate · click thread line to collapse