This is practically required by reviewers and editors. If you wade into a topic area, you need to review the field and explain where you fit in, even though you know full well many of those key citations are garbage. You basically need to pay homage to the "ground breakers" who claimed that turf first, even if they did it via fraud. They got there first, got cited by others, and so are now the establishment you are operating under.
And making a negative reference to them is not a trivial alternative. For one thing, you need to be certain, not just deeply suspicious of the paper, which just adds work and taking a stand may bring a fight with reviewers that hurts you anyway.
Even referring to it as “science” is fraudulent. Testable theories and repeatable outcomes, anyone? Time this whole field was defunded.
"It should be noted that the results cannot be estimated using a physician fixed effect due to a numeric overflow problem in Stata 15 which cannot be overcome without changing the assumptions of the logit model."
... The sad part was they didn't even choose a reasonable model in the first place.
The Stata thing was just one of many, many red flags.
Edit: from all this talk of reproducibility, I wonder what percentage of cutting edge ML research is reproducible (either from lack of public training sets / not enough compute)
Other CS subfields that get a lot of criticism are "network science" and bioinformatics.
Clinical trials can often be flawed, even if the stats are fine, just in how they sample. For example, women are often excluded from trials due to hormonal changes, but how drugs impact women is really important! Participants are also typically drawn from specific locations, and so may not be representative of people with different diets, lifestyles, and environmental factors.
Physics has its own controversies, though not always directly related to replication. For example, Harry Collins recounts the social factors involved in the discovery of gravitational waves: https://blogs.sciencemag.org/books/2017/03/28/harry-collins-...
"If the original study says an intervention raises math scores by .5 standard deviations and the replication finds that the effect is .2 standard deviations (though still significant), that is considered a success that vindicates the original study!"
Why the exclamation point here? The replication study isn't magically more accurate than the original study. If the original paper finds an 0.5 standard deviation effect and the replication study finds an 0.2 standard deviation effect, that increases our confidence that a real effect was measured, but there's no reason to believe that the replication study is more accurate than the original study. Maybe the true effect is less than measured, but maybe not. So yes, it should be considered a success.
1. Effect size is the most important thing. The point of the study is (usually) to guide decisions. Sticking with the article's example, let's say combining both studies shows the increase is likely 0.35 standard deviations. Is the intervention still worth the cost? Is it still the best option?
2. If there's enough data (e.g., an observational study) or a good chance of omitted variables, there's going to be a "statistically significant" difference. No matter what's measured. I would bet my life's savings there's a statistically significant difference in profits of New York businesses depending on whether the owner's named Jim or Bob. A replication of the experiment with all Jim and Bob businesses in another state would also guarantee significance. So it's a coin toss whether the second study would "successfully replicate" the same direction of effect.
That doesn’t mean they’re wrong, necessarily. Overcoming inertia is a huge challenge. Daunting, even.
Overall, the condescending trash talking in this article led me to flag it.
An explanation point as criticism?
>How dare he wear tweed, his argument is invalid.
>The replication study isn't magically more accurate than the original study. If the original paper finds an 0.5 standard deviation effect and the replication study finds an 0.2 standard deviation effect, that increases our confidence that a real effect was measured
It also increased our confidence that the effect is small enough to be ignored. You can't pretend that the two studies are independent from each other. The second is directly the result of the first and you need to use Bayesian methods to calculate your belief of the result. The questions of 'is there an effect' and 'the effect size is >= 0.5 sd' give you two vastly different probabilities and vastly different policy responses.
As in "eats, shoots, and leaves", a little punctuation can totally change the meaning of a sentence. In this case, a period would have expressed agreement while an exclamation point expresses incredulity.
The best sociological research I've read was qualitative though. Questionable replicability is of course built-in in this type of research but the research dealt with relevant questions. Most quantitative sociology seems rather irrelevant to me.
Another problem is of course that most quantitative sociologists don't have a clue what they are doing. They don't even know the basics of statistics and then use some statistical methods they don't understand. It's some kind of overcompensation, I think. Although, psychologists are even worse in this respect. It's really fun to watch an psychologists torturing SAS.
I write this as someone who was originally trained as sociologist and over the years turned into a data scientist.
I ask because I’m enrolled in a research program in “computational humanities”. My initial feeling towards the program is that it’s kind of a sham.
Computational Humanities seems to be as computational as an accountant using Excel for their work. Not that I particularly mind, I’m not very interested in the computational aspect at all.
Why did you subscribe in the program when you're not interested in the computational aspect? Or are you more interested in some kind of grand social theory/philosophy of computation? If you read German, Armin Nassehi "recently" published "Muster. Theorie der digitalen Gesellschaft" (Pattern. Therory of a digital society). He is not the first but I find his stance interesting - based on several interviews, I haven't read the book though. Many sociologists deal with the Internet & AI but I find those works less inspiring because they usually lack an adequate technical understanding. To me it often feels like bushmen theorizing about empty coca cola bottles (you probably don't know the movie?).
It seems to not be based on actual replication results, but predicted replication results? But then the first chart isn't even predictions from the market, but just the author's predictions?
The author clearly has a real hatred for practices in the social sciences. But I don't see any actual proof of the magnitude of the problem, the article is mostly just a ton of the author's opinions.
Is there any actual "meat" here than I'm missing? Or is all this just opinions based on further opinions?
Per https://www.replicationmarkets.com/index.php/rules, volunteers are predicting whether 3000 social science papers are replicable. According to the rules, of those 3000 papers, ~5% will be resolved (i.e. attempts will be made to replicate). According to the article, 175 will be resolved. It's unclear to me who exactly will do that work but I would guess it's people behind replications markets dot com (they are funded by DARPA). The rules say that no one knows ahead of time which papers will be resolved so I assume the ~5% (or 175) will be chosen by random.
The data in the article seems to be based on what the forecasters predicted, not which papers actually replicated (that work hasn't been done yet...or at least hasn't been made public). The author of the article is assuming that the forecasters are accurate. To back up this assumption, he cites previous studies showing that markets are good at this kind of thing.
The tone is ranty but, by participating in the markets, the author is putting his money where his mouth is.
The before curves are gaussian+ distributed and pessimistic, but the after curves are all distinctly bimodal (or worse). This suggests that some population of the participants were broadly pessimized by their surveys and another population was broadly optimized by their surveys.
This could instead be a measurement of how people's trust in science is predicated on how well it matches their own prior beliefs.
+ A sharper eye shows they aren't quite bimodal in the prior belief. Even in those cases, the separation between the modes gets much wider.
The only thing that worries me a little (or a lot sometimes) is that there doesn't seem to be much "bone" for the meat to hang off of - that is, in physics, if your theory doesn't match experiment it's wrong whereas in social science you're never going to have a (mathematical) theory like that so you have to start (in effect) guessing. The data is really muddy, but thanks to recent (good) political developments whatever conclusions can be drawn from it may not be right in their eyes. For example, (apparently) merely commenting on the variation hypothesis can get you fired [https://en.wikipedia.org/wiki/Variability_hypothesis#Contemp...].
I majored in Mathematics but out of curiosity I took some Psychology modules when I was in university. What I found baffling was their lack attention to details. They just seem to have an intuitive model of their subject and they were just reinforcing that intuition while overlooking any details that could have challenged it. Coming from a field where every symbol, punctuation matters, I realised to Psychologists exact details of a curve don't seem to matter much as long as the general trend made sense.
Someone who really impressed me was Dan Ariely who is a behavirol economist. Even though I didn't see any mathematics in his lectures, I loved his approach to the field. I'd be quite happy if more of social science took a similar approach even if they didn't back it up with rigorous mathematics.
He mentions changing the threshold for significance as a possible tweak but the issue is something more fundamental. Humans have flaws - like political biases or a tendency to favor one’s own hypotheses (confirmation bias). Humans also operate within systems that have incentives that can motivate them away from truth seeking (publication bias). All this exacerbated the fundamental problem that statistical techniques are easy to manipulate. Virtually all academic (university) studies, in their published format, simply lack the necessary information, controls, and processes a reader would need to easily detect flawed statistical claims. Instead a reader has to blindly trust - assuming that data was not selectively included/excluded or that the parameters of the experiment were rigorously (neutrally) chosen or whatever else. There is no incentive for the academic world to correct for this - there isn’t for example, a financial consequence for a decision based on bad statistics, as a private company might face.
However the damage has been done and it doesn’t matter if MOST work is done in good faith if the bad work has big impact. As an example, IATs have been used to make claims about unconscious biases and form the academic basis of books like “White Fragility” by Robin DiAngelo. Quillette wrote about problems with White Fragility and IAT as early as 2018 (https://quillette.com/2018/08/24/the-problem-with-white-frag...), and others continue to write about it even recently in 2020 (https://newdiscourses.com/2020/06/flaws-white-fragility-theo...). However few people are exposed to these critical analyses, and the flaws in the scientific/statistical underpinnings have not mattered, and they have not stopped books like White Fragility from circulating by the millions.
We need a drastic rethink of academia, the incentives within it, and the controls that regulate it to stop the problem. Until then, it’s simply not worth taking fields like social science seriously.
Most analyses of the problems in science are really analyses of the problems in academia. There's no iron law that states academia has to be funded to the level it is today, and for most of history it wasn't. And let's recall, that these meta-studies are all about science, which is one of the better parts of academia. Once you get outside that into English Lit, gender studies, etc, the whole idea of replicating a paper ceases to be meaningful because the papers often aren't making logically coherent claims to begin with.
A lot of people look down on corporate funded science, but it has the huge advantage that discoveries are intended to be used for something. If the discovery is entirely false the resulting product won't work, so there is a built-in incentive to ensure research is telling you true things. The downside is there's also no incentive to share the findings, but that's what patents are for.
Of course a lot of social psych and other fields wouldn't get corporate funding. But that's OK. That's because they aren't really useful except for selling self-help books, which is unlikely to be a big enough market to fund the current level of correlational studies. That would hardly be a loss, though.
There were scientists who received financial backing from wealthy individuals in a manner not so different than VCs operate today; Tesla among them.
Regardless, I tend to agree that science that exists for the sake of publishing, because publishing is a requirement of receiving grants, has diluted the respectability of science.
As an amusing nudge, I bet you could do some ML to predict replicability of a paper (per author's suggestion that it's laughably easy to predict) and release that as a tool for authors to do some introspection on their experimental design (assuming they're not maliciously publishing junk).
> I bet you could do some ML to predict replicability of a paper (per author's suggestion that it's laughably easy to predict)
I am betting any such ML system could be gamed and addressing the issue would ultimately still need humans in the loop. For example, what if I am selective with my data, beyond the visibility of ML evaluating the final published paper? I don’t think this is “laughably easy” to predict. It may be easy to spot telltale signs today that predict replicability, but as soon as those markers are understood, I imagine authors will simply squeeze papers through the cracks in a different way.
Another issue is this bit from the author on Twitter:
> Just because it replicates doesn't mean it's good. A replication of a badly designed study is still badly designed. There are tons of papers doing correlational analyses yet drawing causal conclusions, and many of them will successfully replicate. Doesn't mean they're justified.
Just like with the Netflix Prize stuff, where the conclusion was very similar, ie. just dump in as much data as you can, crank up the ML machinery, and it'll discover the features (better than you can engineer them) and learn what to use for recommendation ranking. And that's basically what we see with GPT-3 too. If you have some useful labels in the data it'll learn them even without supervision, because it has so many parameters, it basically sticks.
Get some papers run it through a supervised training phase where you give it a set with every paper scored based on how retracted/bad/unreplicating it is and you'll get a great predictor. Then run it to find papers that stick out, and then have a human look at them, and try to replicate some of them to fine-tune the predictor. Plus continue to feed it with new replication results.
That said, using an ML system as the gatekeeper as OP suggested is a bad idea, as it'll quickly result in the loss of proxy variables' predictive power.
Though ultimately a GPT-like system has the capacity to encode "common sense".
This is the quiet part which most social scientists, particularly psychologists, don't want to discuss or admit: WEIRD [0] selection bias massively distorts which effects are inherent to humans and which are socially learned. You'll hear people today crowing about how Big Five [1] is globally reproducible, but never explaining why, and never questioning whether personality traits are shaped by society; it's hard not to look at them as we look today at Freudians and Jungians, arrogantly wrong about how people think.
[0] https://en.wikipedia.org/wiki/Psychology#WEIRD_bias
[1] https://en.wikipedia.org/wiki/Big_Five_personality_traits
The Big Five are pretty reproducible in part or in whole, but it's strawman to say psychologists are "never questioning whether personality traits are shaped by society." That's not just not true, nor is it even clear what that question means. Go to Google Scholar and search for "Big Five" and terms like "measurement invariance" or "cultural" or "social" or "societies" and take a look.
The Big Five are meant to be descriptive, the "why" is a different issue. (Just to explain it a different way, let's say you do unsupervised learning of cat images, and find over and over and over and over and over again over decades and different databases that the algorithms always return the same 5 types of cats, plus or minus a little. Wouldn't you make a note of it if you were interested in visual types of cats?) And it's important to remember that some consensus around the Big Five wasn't really until the 90s (even today I'm not even sure there's "consensus" around the Big Five).
I agree that there's a problem with selection of participants, but the only way to do that is to increase participation of the scientific community worldwide. And there are whole fields (cultural psychology) dedicated to the problems surrounding this issue.
The Freudian comparison is also worth commenting on in two respects: first, Freudians got in trouble for not pursuing falsifiable empirical research, which is simply not the case for the things you're talking about. Second, everyone loves to hate on Freud, but the basic tenets of unconscious versus conscious processes that sometimes conflict are still a bedrock of neurobehavioral research, including two-system theories ("fast and slow"), which won someone a Nobel prize and is a darling of cognitive researchers. There are legitimate discussions to be have about the utility of two-system theories but those discussions are far more sophisticated than the criticisms I think you're referring to.
Given these foundational issues, it's folly to try to support Big Five or any other descriptive model just by saying that it's a good fit for the numbers. Any principal component analysis will find something which factors out as if it were a correlative component. This dooms Big Five just as reliably as it dooms g-factors or Myers-Briggs or any other astrology-like navel-gazing.
(If you want an example of actual five things showing up again and again and again, mathematics has examples [3][4][5], but it turns out that when actual five things show up, then the reaction is not to serenely admire the correlation, but to admit terror before cosmic uncertainty. Psychologists do not seem to go insane and kill themselves like statistical mechanics or set theorists; have they really seen the face of god?)
[0] https://en.wikipedia.org/wiki/Philosophical_zombie
[1] https://en.wikipedia.org/wiki/Dodo_bird_verdict
[2] https://en.wikipedia.org/wiki/Cartesian_theater
[3] https://en.wikipedia.org/wiki/ADE_classification
[4] https://en.wikipedia.org/wiki/Monstrous_moonshine
[5] https://en.wikipedia.org/wiki/Classification_of_finite_simpl...
[6] https://en.wikipedia.org/wiki/Hard_problem_of_consciousness
https://carcinisation.com/2020/07/04/the-ongoing-accomplishm...
> The interesting thing about the Five Factor Model is what it gets away with, in terms of being considered a theory, even though it is not causal, and makes no predictions. What counts as a “replication” of the Five Factor Model, as in Soto (2019), is the following: a correlation is found between one or more factors of the Five Factor Model and some other construct, and that correlation is found again in another sample, regardless of the size of the correlation. In almost all cases, and in 100% of Soto (2019)’s measures, the construct compared to a Big Five factor is derived from an online survey instrument.
> What counts as a “consequential life outcome” is also fascinating. In most cases, the life outcome constructs are vague abstractions measured with survey instruments, much like the Big Five themselves. For instance, the life outcome “Inspiration” is measured with the Inspiration Scale, which asks the subject in four ways how often and how deeply inspired they are. Amazingly, this scale correlates a little bit with Extraversion and with Open-mindedness. Do these personality traits “predict” the life outcome of inspiration? Is “Inspiration” as instrumentalized here meaningfully different from the Big Five constructs, such that this correlation is meaningful?
It seems built into human character to bite off far more than we can chew, as in free real estate, and then leverage the social value of holding something others are willing to compete for. I think it amounts to a social survival instinct, and i lament how there's very little chance of discouraging people from doing it because of the potential payoff. If anything i think it's a failure of institutions for being built to exploit that competition rather than guard against its excesses.
People who view themself as rational / technical might be even more prone to not realizing how much they are affected by this? If your self-image is that you are very rational person (more rational than others), you might be especially prone to denying and therefore not being aware of biases.
Most new social science research is wrong. But the research that survives over time will have a higher likelihood of being true. This is because a) it is more likely to have been replicated, b) its more likely to have been incorporated into prevailing theory, or even better, have survived a shift in theory, and c) is more likely to have informed practical applications or policy, with noticeable effect.
Physics and other hard sciences have a quick turnaround from publication to "established knowledge". But good social science is Lindy. So skip all the Malcolm Gladwell books and fancy psych findings, and prioritize findings that are still in use after 10 or 20 years.
Not if this article is to be believed! He claims that studies that could not be replicated are about as likely to be cited as studies which are. That implies the problem may instead get worse and worse, the structure more and more shaky as time goes on.
Here, the author seems to only look at recent papers, and so we don't really get to see how the citation patterns have evolved over 10, 20, or 30 years. But even then, established ideas tend to not be cited at all— the concept of "knowledge spillovers", for example, is common in Economics and other fields, yet the original reference is rarely used. Other times, more established claims will be encoded in a book or some work of theory—and people will cite the theory rather than the paper that made the original claim.
Social science asks more of us than any other science. Physics demands that we respect electricity and don't increase the infrared opacity of the atmosphere. Chemistry requires that we not emit sulfur and nitrogen compounds into the air. But social sciences will not rarely call for the restructuring of the whole society.
This is the "problem" with social science, or more properly, with the relationship between the social sciences and the society at large. When we call for "scientific" politics, it is a relatively small ask from the natural sciences, but it is a revolution -- even the social scientists themselves use this word -- when the social sciences are included in the list (Economics is no different). Psychology, as usual, falls somewhere in between.
So the relationship between the social scientists and the politicians may never be as cordial as the relationship between the natural sciences and the politicians. The "physics envy", where social scientists lament that they do not receive the kind of deference that natural scientists do, will have to be tempered by the understanding that the cost of such deference differs widely.
(All of this is ignoring that physics had a 200-year head start)
Meta-science has always been the gift of social science. This will all eventually funnel down elsewhere, just like meta-analysis.
But you're right, in that social science hits very close to home, more so than other sciences. Imagine that it suddenly worked very very well, and someone in the field of neuropsychology could manipulate behavior just like you might a lightbulb. Isn't that what critics are really asking for?
Physics does no such thing. It tells us that increasing the heat retained in the atmosphere increases the planets surface temperature. It is a descriptive science. Not a prescriptive one. Wanting to have industrial civilization possible in the next century is why you don't increase the infrared opacity of the atmosphere. But that is a value judgment far outside the scope of physics, and one social sciences claim is theirs by right of ... something.
The metaphors people use to think about the natural world are terrible, or as Carl Sagan put it Demon-Haunted.
The reason why physics, and other hard sciences, are so useful and respected is that you can switch dependent and independent variables around with a lot of success.
If I have the ideal gas law:
PV = nRT
Then I can rearrange it and be fairly confident it still works.
P = nRT/V
If you are an engineer this is a godsend. You want to set a hard value for P but can only directly control V or T? Try the second equation! You have a chance at succeeding without having to spend decades building machines that blow up and kill everyone around them!
Politicians see that and are jealous. Surely if those lame eggheads can get things to work like that we can too. So the social sciences give you equations as well. After a bunch of statistics we see that:
time spent in school = a*wealth - c
We can't control wealth, but we can control how long people spend in school:
wealth = (time spend in school + c)/a
So if we force everyone to stay in school until they are 50 everyone will have 20 million dollars in their bank accounts.
And to anyone who asks how this works, politicians say: Why are you against science and hate poor people?
Causality is not established via tweaking a correlation or regression analysis, and we social scientists should know that.
I am not familiar with this work. What exactly makes a paper predictably replicatible?
The story of Millikan's oil drop experiment replications and also James Randi's (and CSICOP's) battle with pseudo-scientists convince me of this.