It's facetious, of course, but there was a serious point behind it all. There is a certain tendency in science for a researcher to perform the same study over and over again just using larger or slightly modified data sets simply because that's what he knows how to do. Most of the time these sorts of Version 2.0 studies just reduce the error bars on the result without telling anyone anything new.
Now, of course, sometimes interesting results do come from such things. But much more often interesting results come from studies that attack a radically different problem or use a radically different approach. Science is a manpower-limited, not data-limited endeavor. Scientists have a finite amount of time that they can devote to research and they have to choose what projects to work on. There is still a great deal of low-hanging fruit---projects that require relatively small amounts of funding, relatively small amounts of manpower, and have the potential to yield genuinely new results. There are, for example, some really excellent projects that are being done with a telescope that basically consists of putting a commercial camera lens on a telescope mount [1]. But the difficulty of these sorts of projects is that they require creativity, and that is hard to come by. I'm not faulting anyone, though---I'm not an especially creative researcher myself!
Part of the problem is that grant agencies have a strong bias towards funding incremental science. While they say that they are in favor of funding breakthrough science rather than incremental science, the projects that actually get funded tell a different story. And it's hard to blame them because no one knows a good way to predict breakthrough results. It's an especially difficult problem to solve for theorists---in order to write a compelling theory proposal you basically have to have solved the problem already!
I've heard a number of solutions to these problems, but they're all as compelling to me as a year-long data moratorium (which, to be fair, would indeed force the community to become more creative). Hmm, maybe I'll actually write up that paper for April 1, 2015.
[1] http://www.astronomy.ohio-state.edu/~assassin/index.shtml
Those projects tend to come from individual researchers working on an idea, and there's basically no money available for that. For some fields, this isn't feasible, of course. You can't do high-energy experimental physics in your office with a laptop. But quite a lot of research can be at least partially done without massive amounts of support, and the system is set up in a way that more or less prevents that from happening.
1. Once area we could stop is useless data-mined correlation studies that show statistical significance (assuming you ignore that data-mining has occurred) between action X and outcome Y - the sort where a retrospective study of 500,000 nurses finds that eating candied peanuts reduces prostate cancer by 15%. The rule of thumb in any of these studies is that unless the effect is 300% or greater (smoking and lung cancer is 1500%) then the result is certain to be garbage.
2. We need less “novel” research and more replication of past results. The whole scientific system is set up to reward novelty over accuracy. It is so bad that unless I have seen two independent groups repeat something I doubt it is real no matter how famous the group.
3. We need to reward being right over being first. Right now groups rush papers out so they don’t get scooped and so don’t check their results as well as they should. I would personally like to remove the date off all scientific papers to stop these silly games - after all if something is true does it become less true just because it was published last year rather than last week.
4. We need to reward people who put the effort into replicating work. A simple proposal would be to give publication right to every group that replicated (or could not replicate) a study in the same journal. If some study is published in Nature and you go to the effort of replicating it then you should get an automatic Nature publication.
5. Stop scientist from holding on to raw data. In theory scientist are supposed to share their data, but in practice this doesn’t happen very often. It should be possible to report groups that don’t share data to the funding bodies and if they are found to not be not sharing (or only sharing some of the data) then the group is banned from getting any new funding. It would only take a few banning to stop this immoral data hoarding.
If investigators were forced to immediately release their raw data from these studies, there would be armies of other investigators swooping in to scoop the original team on follow on studies from the data. While this would certainly be great for science, it partially punishes investigators for actually conducting the large trials. I'm not sure how justifiable it would be to put in the effort to conduct a large clinical trial and then only get 1-2 papers out of it (even if they went into NEJM / JAMA / Lancet etc).
What are your thoughts?
[1] For those outside of science what happen now is groups with the data hold back the data and then use access to the data to establish “collaborations” - basically they will give you access to the data as long as you put their names on any resulting papers. The people with the data often don’t actually contribute anything to the new publication other than access to the data and their names - my old boss was a expert at doing this.
This is the reverse of a rule of thumb I find useful, that if you wish to measure something and get an approximate picture of your uncertainty, you should measure it 7-8 times.
The author's rule of thumb hinges delicately upon the definition of "readings", in particular upon the reach and precision of a given reading. I can look in the sky on dark nights and see Mercury, but even if I watch it through binoculars for years, I'll never resolve the "Genuine and Important" precession of its orbit [1], the first solid evidence for General Relativity.
Some important phenomena are subtle and rare. You can watch a liter of pure water for ~1500 years before you can expect a single neutrino from the Sun to interact and make a tiny flash of light [2].
[1] http://en.wikipedia.org/wiki/Tests_of_general_relativity#Cla...
Is this maybe how researchers publish negative results without having to admit failure? We often complain about the dearth of published negative results. We talk about pre-registering studies and so forth.
It seems better to me for researchers to recast a negative result as an inconclusive positive result "requiring more study", than to not publish it at all. Just because there is a call for further research doesn't mean we have to do it.
Paper-based systems are also failure-prone and unfit for purpose. They just fail in familiar ways that the old guard have accepted as just part of the business.
As the saying goes, to err is human, but to really foul things up requires a computer.
But dear god... some people manage to make some awful software.
It strikes me that making the scientific literature machine-parsable and query-able may help a great deal.
Currently the literature is "scraped" to produce scientific metadata which is stored in databases such as PubMed. Of course, that's back to front. Experimental data, findings, methods, workflows, etc etc should be stored in databases of some sort, and "literature" produced by querying the data.
A pipe-dream, of course. But some steps have been taken towards something approaching this.
https://sdm.lbl.gov/sdmcenter/ http://authors.library.caltech.edu/28168/ http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5...
Of course more research is still needed in many areas anyway.