I can't let go of “The Dunning-Kruger Effect is Autocorrelation” (opens in new tab)

(andersource.dev)

468 pointskeshet4y ago285 comments

285 comments

The conclusion in the article:

> Why so angry? I know I’ve taken this far too personally. I have no illusions that everything I read online should be correct, or about people’s susceptibility to a strong rhetoric cleverly bashing conventional science, even in great communities such as HN. But frankly, for the last few years, the world seems to be accelerating the rate at which it’s going crazy, and it feels to me a lot of that is related to people’s distrust in science (and statistics in particular). Something about the way the author conveniently swapped “purely random” with “null hypothesis” (when it’s inappropriate!) and happily went on to call the authors “unskilled and unaware of it”, and about the ease with which people jumped on to the “lies, damned lies, statistics” wagon but were very stubborn about getting off, got to me. Deeply. I couldn’t let this go.

I am afraid I actually agree with the author's point. The anti-intellectual, anti-scientific streak in many poor analyses claiming to debunk some scientific research is deeply concerning in our society. If someone is trying to debunk some scientific research, at least he should learn some basic analytic tools. This observation is independent of whether the original DK paper could have been better.

That said, I give the benefit of doubt to the author of "The DK Effect is Autocorrelation." It is a human error to be overly zealous in some opinions without thinking it through.

darawk4y ago

Let's not forget though that a great deal of "science" is in fact trash[1]. The problem isn't really people being anti-science or pro-science. The problem is science being done poorly, whether by scientists in the credentialed sense, or amateurs.

There is no pat "trust science more" or "trust amateurs less" answer here. The actual answer is that if you want to understand research, you need to actually understand mathematical statistics and the philosophy of statistics fairly deeply. There just isn't any way around it.

1. https://journals.plos.org/plosmedicine/article?id=10.1371/jo...

russdill4y ago

I think there's two extremes here. One is the issue covered well above. There is a great deal of junk science that gets published. That is a problem and it does erode trust. But in some ways it's also how the sausage gets made, there's going to be room for things to get published they later gets refuted. People rightly so have distrust for results coming out in fields they don't have good knowledge in. Without becoming an expert yourself it's very difficult to know who or what to trust.

On the other end there's distrust of broad scientific consensus across different professions, countries, etc. It's the distrust at these levels that is the increasing problem we are facing today.

lelanthran4y ago

> I think there's two extremes here. One is the issue covered well above. There is a great deal of junk science that gets published.

It's more than that, I think. Sibling-thread poster hit the nail on the head when he complained of politicised science.

The social sciences have this dominating and silencing effect on the rest of the sciences.

There's always been junk science, and when found out it gets discredited. This is still happening and is a good thing.

What's new is that any research that might produce results counter to the what the PC-mob deems acceptable is attacked. Whether or not there is consensus amongst researchers in that field is irrelevant when the mob calls for the firing of any researcher who doesn't toe the current political party-line.

Sure, we're not actually in the dark ages, but a trend of silencing voices in the name of purity of thought is particularly troubling, especially as the mob asking for this is unashamedly attempting to implement NewsSpeak[1].

[1] See the argument in yesterdays threads about what "man" and "woman" mean, and should dictionaries be changed, etc.

2 more replies

tsimionescu4y ago

> People rightly so have distrust for results coming out in fields they don't have good knowledge in. Without becoming an expert yourself it's very difficult to know who or what to trust.

A problem here is that there are fields of science that are almost certainly bogus in themselves. One very likely candidate is nutrition, which seems to be fumbling in the dark and has a long history of producing worse recommendations than doing nothing (e.g. replace fat with sugar). More controversially, the entire field of economics is seen by some to be very suspect from a basic foundations view.

1 more reply

mike_hearn4y ago

The reasonable middle ground is, or probably should be, closer to the latter end than the former end unfortunately.

Put simply, science is advertised as self-correcting but in reality it's not. Representative experience documented here: http://crystalprisonzone.blogspot.com/2021/01/i-tried-to-rep...

So, the reasons people learn a generalized distrust of science are that often the sausage doesn't get made. Bad science is published, applauded, cited, breathlessly covered in the media and may even be replicated, yet the first time outsiders to the field actually read the paper they realize it's nonsensical. But then they realize nobody cares because careers were made through this stuff, so why would anyone inside the field want to unmake them?

The degrading trust doesn't come from bad results per se, but rather the frequent lack of any followup combined with the lack of any institutional mechanisms to detect these problems in the first place beyond peer review, which is presented as a gold standard but is in no way adequate as such.

For example, consider how programmers use peer review. We use it, and we use lots of other tools too because peer review is hardly enough on its own to ensure quality. Now imagine you stumbled into a software company that held a cast-iron policy that because patches get reviewed by coworkers you simply don't need a test suite, nor manual testing, nor a bug tracker, code comments, security processes, strong typing, etc. And their promotion process is simply to make a ranking of developers by commit count and promote the top 10% every quarter, and fire the bottom 10%. Moreover they thought you were nuts for suggesting that there was any problem with this. You'd probably want to get out of there pretty fast, but, that's pretty much how (academic) science operates. So of course this degrades trust.

1 more reply

Lutger4y ago

Exactly this. It's even worse, distrust of broad scientific consensus is purposefully cultivated to further political and economic goals, and the methods to do so increasingly perfected. This damages our ability as societies to function in a healthy way. Our capacity to navigate hard problems is diminished by the ever decreasing influence of hard science on politics and policy.

roenxi4y ago

> Without becoming an expert yourself it's very difficult to know who or what to trust.

Who says there is anyone who can be trusted? People keep looking for leaders they can trust and it takes only a brief look at history to see that the search won't stop despite the jaw dropping futility of the exercise.

The important thing is to check that people have incentives to tell the truth and no conflicts of interest. I'd trust someone untrustworthy if they were making money off my well-being. The only thing to watch out for is them not being forthright about their incentives.

We shouldn't trust that skyscrapers stay up because engineers are trustworthy. They stay up because the engineer goes down with the building.

quanto4y ago

Completely agreed. While I strongly believe in citizen science and people's right (and perhaps obligation) to critique established science, there is just so much poor analyses done by people to criticize some scientific findings they do not approve, motivated emotionally or otherwise. This phenomenon does not bring us closer to finding scientific facts or resolving the replication crisis. People should learn some basic analytic tools first.

j3th9n4y ago

Scientific facts do not exist, scientific observations do.

bigfudge4y ago

We should really just do (and certainly publish) less research.

1 more reply

rkangel4y ago

One way that journals could actually add value (a concept to which they seem resistant) would be to review the statistical analysis. Statistics is hard and easy to get subtley wrong, and is often an independent skill to the underlying science. If journals had statistics experts to critique the analysis techniques prior to publishing it would be a great improvement in the confidence in which we could read papers.

bigfudge4y ago

I think this is a great idea, but in my experience there are surprisingly few such experts available. In practice, most statisticians not doing active methods research (I'm thinking of 'trial statisticians' mostly here, in CTU's) just cargo cult whatever procedures previous trials used. I guess they would pick up issues around sample size, but without also integrating that with some substantive knowledge about plausible effect sizes I'm not sure what value they would have.

Plus, it would reduce the number of publishable papers quite substantially including from high profile authors/groups, so I don't think they want the fight. We should also remember that most journal editors are also involved in publishing this research — they often have no real incentives to make things awkward.

Hendrikto4y ago

If only journals actually wanted to contribute… They just want to skim.

starfallg4y ago

>The problem isn't really people being anti-science or pro-science. The problem is science being done poorly, whether by scientists in the credentialed sense, or amateurs.

That's a very simplistic take on it. Bad science is a necessary part of the process and dealt with accordingly by the scientific method.

The problem is that science that is bad or incomplete is being reported as fact or truth, or arguably even worse, as entertainment in order to gain an audience. This is what actually eroded the trust in science, as people kept repeating things that reinforced and further misshaped their biases.

Human nature tends to distrust stuff we don't understand. Hence, our trust in science, which in many fields is often beyond the understanding of laymen, have to have constant reinforcement. However, the goal of media, and especially social media is to increase eyeballs for their content, and the truth sells a lot less well than sensationalised content pandering to the audience.

Simply put, there is not much profit in reporting science truthfully, and every incentive to sensationalise it.

avereveard4y ago

part of the issue is that producing shit research cost very little, and debunking shit research cost a lot. just pick up one of the many "x reasons why earth is not a sphere" videos, some are pretty easy to debunk, but other require to understand i.e. potential fields (if earth is round why don't train engineer take into account curvature when laying tracks and variations thereof)

cgriswald4y ago

Shit research can sometimes be debunked just by running the numbers given by the shit researchers. Flat earth theorists don’t often offer up actual research (and the one group I’ve seen try got negative results and concluded they messed up, not that they were wrong).

A theory ought to be able to answer questions like “Why don’t train engineers take the curvature of the Earth into account?”

The problem is when someone comes along and thinks an unanswered question (or even just someone not knowing the answer off the top of their head) proves a theory is completely false (or worse, proves their favorite theory correct). (And to even believe the Earth is flat is to be a conspiracy theorist so in this particular case no prove will ever suffice anyway.)

sandgiant4y ago

I agree, but I don't think this is limited to science. I think a great deal of everything is in fact trash. This is why we need education and good faith discussion.

gadders4y ago

https://en.wikipedia.org/wiki/Sturgeon%27s_law

1 more reply

gadders4y ago

Science itself is the best method we have for exploring and making sense of the world around us. The method is rock solid.

In between us (gen pop) and The Method are scientists, and scientists are just as fallible as any other group of people - lawyers, politicians, coders, shop assistants.

specialist4y ago

In other words, emphasize the process more than the outcomes. If the scientific process used proves sound, then I have more confidence in the outcomes.

Alas, that's a hard sell to laypersons thru the mediums of soundbites and tweets.

boppo14y ago

What about the replication crisis? It's possible to use rigorously sound statistics to lie (or at least unknowingly spread falsehoods). I can't tell you how many times I've seen headlines or abstracts of studies that seem to contradict ones I've seen previously, and back and forth! Particularly in the social sciences.

I recall one study that said all white people are committing environmental racism against all non-white people. I dove in and read the whole thing wondering what method could have yielded scientific confidence in such a broad result. Turns out the model used was a semi-black box that required a request for access and a supercomputer to run. But it was in a Peer Reviewed Scientific Journal and had lots of Graduate Level Statistics so I guess it seemed trustworthy.

atoav4y ago

The issue here has not much to do with the replication crisis. It has to do with the fact that most people who use bits of information to make their point more convincing don't care whether that information is true or not. They are not seeking to convince the other side of the issue, they are seeking to convince other believers.

It is literally like this:

- someone makes a point that questions your believe

- you google a phrase that would come in studies that proof otherwise

- you take the first thing that looks promising, and fly over the first page, and paraphrase a good bit in a way that makes your point

- you publish it as part of a post, youtube video or whatever

- danger averted

Bad studies play into this, but even if the studies are good, or bad studies that have been retracted the same thing happens. James Wakefield who originally published the "combined vaccines cause autism study" after patenting a non-combined measles vaccine had his study retracted by the lancet soon after publication. He lost his status as a doctor etc. And you will still find people who use his study as a source.

Of course studies whose outcome collide with our believe systems are always harder to trust than those who validate it — but this is why you look at the methods used and other indicators that might make that study bogus.

samhw4y ago

That was Andrew Wakefield, FWIW. I totally agree with your point otherwise.

quanto4y ago

A replication crisis indeed exists. All the more reason to analyze rigorously. Poor analyses (and borderline name-calling) in the original article do not help with the crisis.

whatshisface4y ago

>it's possible to use rigorously sound statistics to lie (or at least unknowingly spread falsehoods).

I don't think this is true. It is possible to put a lot of work into unsound statistics and to make a lot of "noise and fury" about how mathematical you are while failing some basic principle, but I don't think sound statistics can mislead. The replication crisis was caused by scientists not being rigorous and journals not forcing them to be. You absolutely cannot accept publication as a sign of sound techniques except in journal/field combinations that have a deserved reputation.

kortilla4y ago

> but I don't think sound statistics can mislead.

Of course they can, unless you magically exclude all statistics that made a bad assumption on independence.

I plot all the daily high temperatures and the presence of the ice cream cart and it turns out the ice cream cart causes warmer highs! Solid statistics.

Turns out the guy that has the ice cream cart has a weather app on his phone though and doesn’t come out on forecasted cold days.

2 more replies

lelanthran4y ago

> It's possible to use rigorously sound statistics to lie (or at least unknowingly spread falsehoods)

The book "How to lie with statistics" is one of the best statistics textbooks that I have read. It basically makes you immune to misleading stats (charts, tables, everything).

IIRC, the only thing that is missing from the book (it's a really old book) which is very relevant is p-hacking.

qfwfq_4y ago

Given the explosion in the number of journals and the impossibility of effective peer review, being published in a journal does not mean what it used to. This is part of the material drivers for the replication crisis (journals can no longer effectively gatekeep scientific validity), but it also reflects something real about the practice of science: little social cliques come up with pet theories and, over time, "fight" with these theories on epistemic common ground. The successful ones, we'd like to think, are the ones that last the most rounds in the fight, but that probably only holds in the long run. Contradiction, in itself, is normal (and was before!)

abirch4y ago

> That said, I give the benefit of doubt to the author of "The DK Effect is Autocorrelation." It is a human error to be overly zealous in some opinions without thinking it through.

If only there were a term for "a cognitive bias whereby people with limited knowledge or competence in a given intellectual or social domain greatly overestimate their own knowledge or competence in that domain relative to objective criteria or to the performance of their peers or of people in general"

coffeeblack4y ago

That happens when science is politicized, and any scientists critical of the “official” results is destroyed. From climate to Covid, so many areas where that happens.

pavlov4y ago

If by “destroyed” you mean “billionaires will happily fund their contrarian research for years regardless of its peer reviews”…

coffeeblack4y ago

No, I mean researchers and professors suddenly not getting any research grants anymore, suddenly getting fired from their tenured jobs, not being invited anymore to conferences, etc.

What billionaire funds “contrarian” research?

2 more replies

c1ccccc14y ago

It still seems to me like "The DK Effect is Autocorrelation" is basically correct. The important thing isn't whether or not independence should be the null hypothesis, because calling something a "null hypothesis" is just an arbitrary label that doesn't affect reality. The important thing is that what we can actually conclude from the Dunning-Kruger paper is a lot less than popular presentations of the concept claim. In particular, "more skilled people are better at predicting their own performance" is really not supported by the paper, since that's not true of random data, which has everyone being equally terrible at predicting their own performance. If the random data can reproduce that graph, then the graph can't be proof that more skilled people are also better predictors.

Anyway, "The DK Effect is Autocorrelation" definitely seems to be both statistically literate, and a good faith criticism of the Dunning-Kruger paper. In light of that, calling it "anti-scientific" seems unfair, since criticism and debate are an important part of science.

molf4y ago

> calling something a "null hypothesis" is just an arbitrary label that doesn't affect reality

It does affect your conclusions though.

The choice of null hypothesis in "The DK Effect is Autocorrelation" determined how the random data was generated. The hypothesis is: "nobody has any clue whatsoever how competent they are". The random data was specifically crafted for that hypothesis.

The choice of null hypothesis in this article is: "everyone roughly knows how competent they are". This random data, too, is specifically crafted for the null hypothesis.

So what does this mean? If you pick the a particular null hypothesis then you can try to argue that the DK is a statistical artefact. But it's not, it is an artefact of choosing a particular null hypothesis.

There are valid criticisms of the DK study, though. See this comment for example: https://news.ycombinator.com/item?id=31119196

qfwfq_4y ago

No, nulls matter a great deal. If you want to test a claim in Null Hypothesis Statistical Testing, the "significance" of the claim is in direct reference to the null. Changing a null will change the significance of the alternative. My favorite statement of this is from Gelman:

> the p-value is a strongly nonlinear transformation of data that is interpretable only under the null hypothesis, yet the usual purpose of the p-value in practice is to reject the null. My criticism here is not merely semantic or a clever tongue-twister or a “howler” (as Deborah Mayo would say); it’s real. In settings where the null hypothesis is not a live option, the p-value does not map to anything relevant.

https://statmodeling.stat.columbia.edu/2017/01/07/we-fiddle-...

heavyset_go4y ago

I think what contributes to this phenomenon are both second-option bias[1] and motivated reasoning, at least with respect to those who choose to believe in the poor analyses.

[1] https://rationalwiki.org/wiki/Essay:Second-option_bias

Rastonbury4y ago

I read a lot of papers on behavioural economics and psychological decision making experiments for university, like dunning-kruger, kahneman, etc and in my opinion the first autocorrelation article reads like a rebuttal paper but more informal, the approach is scientific even if it may be flawed. This is how knowledge advances. I disagree that it is anti-science. Challenging accepted postulations is good. Even famous professors make mistakes, I don't blame the writer for making an honest mistake. That's how we got this new piece of writing

Behavioural science is a pretty new field, its pretty easy to get abberant results or manipulate the results to show 'something' statistically. Many findings in earlier papers could not be replicated, or had applied statistics incorrectly, or showed different results when research participants were not white college kids.

This is a whole other problem within academia, the pressure to publish something even when there is nothing and perceived legitimacy based on the number of citations a paper has. My professor always said don't look at the number of citations, understand the method and the rebuttal, there were numerous low citation but solid papers showing flaws in famous ones but everyone who isn't deep into the subject holds the original assertion to be legitimate because its "famous"

legalcorrection4y ago

Most social science is shoddy, fake, or otherwise misleading (i.e. it proves nothing meaningful despite the claims of the researchers). If you believed every social science study you heard about, you'd be more wrong about the world than if you disbelieved them all.

macrolocal4y ago

Yep, it's the worst approach to this subject matter, except for all the alternatives.

gambler4y ago

> The anti-intellectual, anti-scientific streak in many poor analyses claiming to debunk some scientific research is deeply concerning in our society.

People endlessly reference the Dunning-Kruger effect as a meme, without ever having read the paper, let alone having checked its methods. You don't seem to have a problem with that.

On the other hand, after seeing an article that uses essentially statistical arguments to debate a scientific study you conclude that there is some "anti-intellectual, anti-scientific streak" in our society and that it should be of grave concern.

This doesn't make any sense except as an extreme case of virtue-signaling.

notahacker4y ago

Seems quite reasonable to argue that superficially plausible "debunkings" by people that apparently misunderstood a paper are more harmful to scientific progress than people casually referencing the scientist's names as a meme or insult. (And I say that as someone who didn't think the DK "debunking" argument was totally without merit)

What's more harmful to medicine: a fashionably non-expert contrarian who doesn't understand the appropriate null hypothesis making a superficially plausible statistical argument that actually the trials suggest the drug is harmful to wide acclaim from laymen, or people casually referencing or even being administered the drug without reading the original trial writeups for themselves?

danbruc4y ago

Read the actual paper [1], there is so much more than those charts. They ask for an assessment of the own test score and an assessment of the ranking among the other participants to distinguish between misjudgments of the own abilities and the abilities of others. They give participants access to the tests of other participants and check how this affects self assessments - competent participants realize that they have overestimated the performance of other participants and now assess their own performance as better than before, incompetent participants do not learn from this and also assess their performance even better than before. They randomly split participants into two groups after a test, give one group additional training on the test task, and then ask all of them to reconsider their self assessments - incompetent participants that received additional training are now more competent and their self assessment becomes more accurate. This is not everything from the paper and probably also somewhat oversimplified, I just want to provide a better idea of what is actually in there.

Everyone is free to question the results, but after actually reading the entire paper I can confidently say that poking a bit at the correlation in the charts falls way short of undermining the actual findings from the paper. The actual results are much more detailed and nuanced than two straight lines at an angle.

[1] https://www.researchgate.net/publication/12688660_Unskilled_...

mike_hearn4y ago

I think if you wanted to poke holes in the paper you'd start with the generic issues that are typical to much psychological research:

1. It uses a tiny sample size.

2. It assumes American psych undergrads are representative of the entire human race.

3. It uses stupid and incredibly subjective tests, then combines that with cherry picking:

"Thus, in Study 1 we presented participants with a series of jokes and asked them to rate the humor of each one. We then compared their ratings with those provided by a panel of experts, namely, professional comedians who make their living by recognizing what is funny and reporting it to their audiences. By comparing each participant's ratings with those of our expert panel, we could roughly assess participants' ability to spot humor ... we wanted to discover whether those who did poorly on our measure would recognize the low quality of their performance. Would they recognize it or would they be unaware?"

In other words, if you like the same humor as professors and their hand-picked "joke experts" then you will be assessed as "competent". If you don't, then you will be assessed as "incompetent".

Of course, we can already guess what happened next - their hand picked experts didn't agree on which of their hand picked jokes were funny. No problem. Rather than realize this is evidence their study design is maybe not reliable they just tossed the outliers:

"Although the ratings provided by the eight comedians were moderately reliable (a = .72), an analysis of interrater correlations found that one (and only one) comedian's ratings failed to correlate positively with the others (mean r = -.09). We thus excluded this comedian's ratings in our calculation of the humor value of each joke"

The fact that this actually made it into their study at all, that peer reviewers didn't immediately reject it, and that the Dunning-Krueger effect became famous, is a great example of why people don't or shouldn't take the social sciences seriously.

Pyramus4y ago

> is a great example of why people don't or shouldn't take the social sciences seriously.

Oh the irony in your last statement. Somebody who hasn't done social science research professionally (this is an assumption, let me know if I'm wrong), has difficulty judging what social science research can (and can't) do ...

jtc3314y ago

One does not need to have done social science research to be able to recognize obvious general philosophy of science level problems with the methods used in much social science.

I’d we take your claim seriously then we have to disallow all critiques of the replicatability crisis in the social sciences that don’t come from social scientists, but that would present an obvious new problem: conflict of interest. It’s also just an absurd requirement.

1 more reply

danbruc4y ago

I totally agree, the first study with the jokes seems silly. But I am also not from the field, maybe it is not actually as silly as it seems to me. But the other studies seem much better to me and removing the first one would not change the conclusions.

mike_hearn4y ago

Is there supplemental material I didn't notice? I only scan read it after the joke section but I can't find any mention of supplemental data anywhere. That's a problem because although you say the other tests are better, no information appears to be provided on which we can judge that.

Let's look at the second test. It's advertised as a "logic test". The description is:

> Participants then completed a 20-item logical reasoning test that we created using questions taken from a Law School Admissions Test (LSAT) test preparation guide (Orton, 1993).

That's the entire description of their method. So immediately, we can see the following problems:

1. Just like the joke test, there's no way to replicate this given the description in the paper. Which questions did they take and why? In turn this throws all claims that the DK study has been replicated into question.

2. The citation is literally a Cliffs Notes exercise for students. It's about memorization of answers to pass law exams, not an actual test itself designed to verify logical reasoning ability. Why do they think this is a good source of questions for testing logic? Law is not a system of logic, there's even a famous saying about that: "the life of the law is not logic but experience". If you wanted to test logical reasoning a more standard approach would be something like Raven's Matrices.

Putting my two posts together there's a third problem:

3. Putting aside the obvious problems with subjectivity, their joke test is defined in an illogical way. They define a test of expertise (working as a comedian), select some people who pass this test and define them as experts, then discover that one expert would have been ranked by their own test as "incompetent but doesn't know it". Yet this is a contradiction, because this person was selected specifically because the researchers defined them as competent. Rather than deal with this logical contradiction by reframing the question they simply ignore it by discarding that comedian from their expert pool.

This is good evidence that DK themselves weren't particularly logical people, yet, they claim to have designed a test of logic - a bold claim at the best of times. Ironically, it appears DK may be suffering from their own effect. They believe themselves to be competent at designing tests yet the evidence in their paper suggests they aren't.

3 more replies

Mentlo4y ago

I can't quite figure out from this post and the posts after if you have any background in social science or not (you have stated you didn't do social science professionally - but I get a nagging feeling you have studied it) - and I'll try to explain why I think it matters. For what it's worth - I wouldn't necessarily object to what you wrote here if you finished with "great example of why people don't take the social sciences seriously" and left it there. I do have a problem with "shouldn't", although in a different setting (i.e. amongst social science people) I would probably argue for "shouldn't".

Full disclaimer - I was a sociological researcher before I started working in IT - and would (I can appreciate the irony given all of this is about DK effect) rate myself as very significantly above average in terms of methodological rigour and mathematical skill compared to other social researchers.

One thing that is taught to social researchers - although I've seen it much less with psychologists - is that social research is fundamentally different from natural sciences in that it is accepted as fundamentally subjective. Now, a radical such as myself will tell you that all research, including natural science, is not entirely objective due to very subjective navigation of selection bias, but putting that to the side - this is an extremely important point when evaluating social research.

Coming back to your original point - I would agree with the points you object to vis-a-vis original DK Effect paper, however, as a social researcher, I am always already coming into reading that paper knowing that I'll have to take it with spoonfulls of salt. There is no need to write the paper in a way that puts in many of the disclaimers you might expect, because we are institutionally taught that these disclaimers apply.

Having said that - one of my peeves with social research, and why I ultimately went away, is that a lot of garbage goes on and gets through peer review. There is almost no proper testing of quantitative instruments and methods. Which is why I agree with your point that it rightfully isn't taken seriously - but I would object to your assertion that it shouldn't be taken seriously. Especially amongst IT professionals who are already going to have a bias against non-STEM. Point out the shortcomings and apply a different interpretive lense, rather than discounting the field completely - as social science can be better and taken seriously if it was held to a higher standard, even with the methodological shortcomings we have today - but it is very often discounted wholesale, which I don't think is going to incentivise the bubble that is forming around it to reform and get better.

MichaelBurge4y ago

The plot to me always read "People estimate themselves at 60-70% percentile - above average, but not the best". And then given this broad prior, people do place themselves accurately(because the plot is increasing).

So it seems people are bad at doing global rankings. If I tried to rank myself amongst all programmers worldwide, that seems really hard and I could see myself picking some "safe" above-average value just because I don't know that many other people.

There's also: If you take 1 class in piano 30 years ago and can only play 1 simple song, that might put you in the 90th percentile worldwide just because most people can't play at all. But you might be at the 10th percentile amongst people who've taken at least 1 class. So doing a global ranking can be very difficult if you aren't exactly sure what the denominator set looks like.

So I think it's an artifact of using "ranking" as an axis. If the metric was, "predict the percentage of questions you got correct" vs. "predict your ranking", maybe people would be more accurate because it wouldn't involve estimating the denominator set.

cortesoft4y ago

This is exactly my conclusion, and it seems obvious... just look at the self assessment line - pretty much everyone thinks they are slightly above average. Once you know that everyone thinks they are above average, you already know how it will play out... the bottom quartile will have the biggest gap between actual skill and estimated skill.

etchalon4y ago

What's that line about half of all people being below average…

bmacho4y ago

> There's also: If you take 1 class in piano 30 years ago and can only play 1 simple song, that might put you in the 90th percentile worldwide just because most people can't play at all. But you might be at the 10th percentile amongst people who've taken at least 1 class. So doing a global ranking can be very difficult if you aren't exactly sure what the denominator set looks like.

Yes, and this literally implies that people in the lowest quartiles can't and won't rate themselves to be in the lowest quartiles when they are forced to give an answer. (Especially on tests that doesn't measure anything (getting jokes? really?), on tests that they have no knowledge about (how would they know that how their classmates perform on an IQ test???), or on tests that just have a high variance.)

And therefore they will "overestimate their performance".

It's like grouping a bunch of random people, and forcing them to answer whether their house is short, average or high. The "people living in short houses" will "overestimate the height of their houses", while the "people living in towers" will humbly say they live in an average high house.

Is this an existing and relevant psychological phenomenon, different from the general inability to guess unknown things? I don't think so.

If you think so, then give me proof.

PeterisP4y ago

One difference is that as the experiments were run on psychology students, they know the population, those are their peers with whom they interact on a daily level and they should have an idea of how they compare with them.

> how would they know that how their classmates perform on an IQ test???

Are you serious? If you're interacting with your classmates, you definitely should have some idea on how their intellectual capabilities differ between each other and also with respect to you. In a small class doing lots of things together, someone might even literally count their "ranking" at some metric that highly correlates with IQ, estimating that Bob, Jane and Mary are above me and Dan and Juliet are below me, so I'm at 40th percentile.

It's not appropriate to treat these aspects as unknown things or unknowable things.

fshbbdssbbgdd4y ago

Thank you! This article creates a dichotomy where our hypothesis must be either

1) self-assessment is perfectly correlated with skill, or

2) completely uncorrelated.

I think neither of these makes sense as a null hypothesis.

The model you describe matches my intuition about what we should expect: people know something about their own skill level, but not everything.

Mentlo4y ago

One minor correction - the article creates a dichotomy where the hypothesis must be either 1) self-assesment is somewhat correlated with skill, or 2) completely uncorrelated

And this is a true dichotomy. The "autocorrelative" effect doesn't need perfect correlation, just some correlation.

0x20cowboy4y ago

60% of the time, it works every time.

1 more reply

galaxyLogic4y ago

I think Dunning Krueger makes intuitive sense. When you become skilled in your field you learn from other people in your field, and your assessment of yourself is based on your relation to the skills of those other people. But if you know very little about something, you have no reference point to evaluate yourself against.

When you learn something you also learn what are some of the mistakes you can make. You evaluate your performance then against the mistakes you didn't make. Consider a piano player, or figure-skater. You have to know about what figures are difficult to perform to evaluate a performance, and you don't know what the difficult ones are until you have studied and tried to perform them.

dahart4y ago

> I think Dunning Krueger makes intuitive sense.

It’s been argued before that this is the only reason that DK gained any notoriety; because it feels right, not because it is right. It’s the “just-world” theory: we want to believe that confident people are overcompensating.

Is it actually intuitive though? Consider your own example. Most people who don’t know piano or figure skating are well aware that they don’t know, and do not rate themselves highly at all. Would it be surprising to learn that people who don’t know any law or engineering don’t often hold any doubts about their lack of skill, and by and large are not deluded nor erroneously believe they’re great at these things they don’t know?

The DK paper didn’t measure knowledge-based skills like piano, figure skating, or law. It measured things like the ability to get a joke, and conversational grammar. How would you rate your own ability to get a joke? (Does this question really even make a lot of sense?)

It’s important that the methods in the DK paper focused on tasks that are hard to self-evaluate, because when people have tried to replicate DK with more well defined knowledge-based activities, they have often demonstrated the complete opposite effect, that there is widespread impostor syndrome, and skilled people underestimate themselves.

audiometry4y ago

"Most people who don’t know piano or figure skating are well aware that they don’t know, and do not rate themselves highly at all. "

I think this case (real_skill = 0, perceived_skill = 0) is maybe a trivial case, and that the bit of truth DK-idea catches is when someone with very little skill considers how much work it would be to get to whatever a 'fully skilled' version would be, they woefully underestimate.

Picture someone in their first summer of mountain biking watching youtube videos of the best guys. Yes, you know you can't jump like they do, or turn as skillfully, but you're getting better each month. However, you still grossly underestimate how hard it is to get to that skill level.

At least it's my personal experience as a thoroughly unskilled!

sn414y ago

> It’s been argued before that this is the only reason that DK gained any notoriety

I'm sorry, but I have to comment on a word in this line: I feel that increasingly, "notoriety" is used when "notability" might be better.

A thief is notorious. A statistical effect is notable, imho.

But I agree with your basic point: I can't skate, much less figure skate. And I am pretty accurately aware of that fact despite my lack of skill.

dahart4y ago

Notoriety means “the state of being famous”, which is the meaning that I intended. I actually don’t want to use “notable” in this case, because that would imply that I believe the DK effect is real. Notorious might even be a good word here, since the paper has problems with its claims and its interpretation of it’s own data.

hef198984y ago

>> figure skating, or law

Thinking about all the "INAL" answers and the obviously wrong "legal" advice and opinion on legal topics you can come across on HN on a daily basis, I think law is good example of people overestimating their knowledge.

Replace figure skating with any other sports, so, and try having discussions about, e.g, a defeat of any soccer team. All of a sudden everyone just became a soccer coach. And everyone is able to critique individual player's performance and skill and technique. If anything, this proofs the DK effect rather well.

dahart4y ago

You might be forgetting that DK demonstrates a positive correlation between confidence and skill. They gave statistical evidence for people who believe they’re right actually being right more often on average. The question the paper is actually asking is why aren’t people’s self-estimates perfect, but it does not, contrary to popular misunderstanding, demonstrate that confident people are lower skilled. Reading bad legal advice on HN is not a demonstration of the so-called DK effect.

sally_glance4y ago

But is DK not explicitly about assessing yourself and not about assessing others? I feel like being able to critique the performance of others is different from being able to critique yourself. There might be a correlation between the two, but they're not the same.

3 more replies

sam0x174y ago

Put another way, to get good at something, you have to get good at self-assessing your performance at that thing, or you have no way of advancing.

Godel_unicode4y ago

I think it's actually the next step; to get good at something you have to get good practice, and that requires good self-assessments and knowing how to practice the part you're weak on.

There are people who can't advance because they can't see the problem, and people who can't advance because they can't (or don't want to) correct it. The end result is the same though.

juancn4y ago

It's also intuitive if you think about error in self assessment. Skill is asymptotic to some upper bound. The closer to the asymptote (higher skill), then most likely the error in estimation is under it, since it cannot be above it.

Conversely it cannot also be under zero, so error is most likely going to be above the actual skill line (over estimation, since it's clamped below it).

djmips4y ago

Most people can't play the piano or skate. Don't consider those ones. Consider the things that everyone can do. Let's pick driving a car. I am fairly convinced that most people feel after a few years they are excellent at driving their car but in fact they are just OK to terrible. And this is with a lot of practice!

skybrian4y ago

I think that's assuming more ignorance than even an unskilled person has.

If you had never listened to a professional play piano before then you'd have no idea what level of performance is possible. Similarly, if you had never seen skilled skaters perform on TV.

But we have done these things, so it's obvious that they're doing something that's very difficult.

Maybe you don't fully appreciate the skill, though. You wouldn't do well as a judge who compares the performances of professionals. But comparing novices to professionals seems easy?

digitalronin4y ago

If you had never listened to a professional play piano before then you'd have no idea what level of performance is possible. Similarly, if you had never seen skilled skaters perform on TV.
But we have done these things, so it's obvious that they're doing something that's very difficult.

Sometimes the things we find most impressive, in a demonstration of a skill we don't have, aren't the most difficult things.

I remember being absolutely blown away by some aerial circus tricks and stunts I saw at shows. Later, I started studying and eventually performing myself, and it's often the case that the most crowd-pleasing stunts are some of the easiest to perform.

As a performer, you could always tell which members of the audience knew their stuff, because they'd be the only ones applauding the tricks that might not have looked so spectacular, but were actually the most difficult.

dav_Oz4y ago

It's more like intermediate (most vulnerable to the DK-effect) to advanced (utmost appreciation for professionals).

Taking the piano example: after 1-2 years of progressive learning you can certainly give off the impression to somebody unfamiliar/untrained (including yourself to an extent) that you are actually quite good: Intermediate stage. But after awhile when confronted with more and more challenging stuff, by discovering different styles and finetuning your hearing; you at some point reach the very visceral and uncanny sensation of the countless possible roads you can now explore: advanced stage.

galaxyLogic4y ago

Further one travels the less one knows - Lao Zu

bryanrasmussen4y ago

>Similarly, if you had never seen skilled skaters perform on TV.

then, as a person who has lived in the world and has the normal physical skills of such you probably think "whoa, how in the heck did they do that" when you finally see it.

h0l0cube4y ago

The OP article mentions in their rumination that there's some difficulty in generalising DK:

"And maybe there’s no contradiction - there’s always room for nuance, for finding out where the Dunning-Kruger effect is relevant and where it’s not. That can be done with more studies, but only if the authors manage to agree on assumptions and basic statistical practice."

waynesonfire4y ago

Your post reminded me of one of my favorite Adam Savage videos where he touches upon this idea you're exploring. I encourage folks to see it, he articulates it so well.

I linked to the start of the video where he begins to build the idea. TLDR is he mentions Monet painting Impression Sunrise and how it was something that people have never seen before and it took a bit of time for it to blow people away--they needed to develop "new eyes" to see the genius. Adam then dives into this idea of "new eyes". I'm sure many of us have experienced this in our life and it was so nice to hear Adam unpack it.

https://www.youtube.com/watch?v=qE7dYhpI_bI&t=122s

dataflow4y ago

Sounds kind of similar to [1]:

> After two months in the bakery, you learned how to “see” clean.

> Code is the same way.

[1] https://www.joelonsoftware.com/2005/05/11/making-wrong-code-...

nighthawk4544y ago

Thanks for linking that, I really liked this part:

> OK, so far I’ve mentioned three levels of achievement as a programmer:

> 1. You don’t know clean from unclean.

> 2. You have a superficial idea of cleanliness, mostly at the level of conformance to coding conventions.

> 3. You start to smell subtle hints of uncleanliness beneath the surface and they bug you enough to reach out and fix the code.

> There’s an even higher level, though, which is what I really want to talk about:

> 4. You deliberately architect your code in such a way that your nose for uncleanliness makes your code more likely to be correct.

> This is the real art: making robust code by literally inventing conventions that make errors stand out on the screen.

1 more reply

javajosh4y ago

>I think Dunning Krueger makes intuitive sense.

If human cultures can be characterized as default arrogant or default humble then it stands to reason that arrogant cultures will have a DK effect, and in humble cultures you won't.

robocat4y ago

I think you have the common misconception about the DK effect, which is incorrectly summarised as "unskilled and unaware".

There is also the other end of the scale where "skilled and unaware" occurs: people under-assessing their skill (presumed that this is due to judging that most people also have similarly high skill levels).

I think your two "cultures" would shift the self-assessment line up or down on the graph (constant), but not affect the slope very much (multiplier). The line shape or line slope must change somewhat since values are limited (between 0 and 100).

JetAlone4y ago

Even people conditioned to be humble could have a strong motivation to believe something is true and overestimate their own knowledge/ability in order to stand on what they perceive as evidence. For example a person's depression, religious beliefs, or an over-emphasized belief in DK itself could be a possible reason they have an erroneously deflated opinion of themselves, and simultaneously employ inflated confidence in irrational arguments that demonstrate why they are almost completely worthless at their field. That's pretty much how depression is secretly prideful in a sense: over-estimating our own mental ability to assess our helplessness.

When I did some cognitive behaviour therapy, I un-learned things like "all or nothing thinking" and the expectation that I could accurately predict the outcome of any course of action by modeling future performance off of a past failure.

robocat4y ago

> I un-learned things like "all or nothing thinking" and the expectation that I could accurately predict the outcome of any course of action

Do you know any words, stereotypes, or clichés for this? Or even what the related mental disorder is called if it were to become debilitating? Or a specific word for the complete clustering of related signals?

I am guessing those issues plus there related issues (¿syndromic?) are common - but I don’t know where to group it in my own mind.

1 more reply

darawk4y ago

I can't believe nobody has pointed out that the original article debunking the DK effect is in fact an example of the effect. Truly poetic.

TrackerFF4y ago

Tbh, I only ever hear people reference DK when they're trying to point out that they're "working with morons" (in their opinion, of course).

Scea914y ago

...and claim they suffer from impostor syndrome themself.

spiderfarmer4y ago

It was pointed out multiple times in the original thread.

etchalon4y ago

Something I generally keep in mind about articles posted to HN:

A large portion of the HN audience really, really wants to think they're smarter than mostly everyone else, including most experts. Very few are. I'm certainly not.

Articles which "debunk" some commonly held belief, especially those wrapped in what appears to be an understandable, logical, followable argument, are going to be cat nip here.

Articles like this are even stronger cat nip. If a member of the HN audience wants to believe they're mostly smarter than mostly everyone else, that includes other members of the HN audience.

So, whenever I read an article and come away thinking that, having read the article, I'm suddenly smarter than a huge number of experts, especially if, like the original article, it's because I understand "this one simple trick!", I immediately discard that knowledge and forget I read it.

If the article is right, it will be debated and I'll see more articles about it, and it'll generate sufficient echoes in the right caves of the right experts. Once it does, I can change my view then.

I am not a statistician, or a research scientist. I have no idea which author is right. But, my spider sense says that if dozens of scientific papers, written by dozens of people who are, failed to notice their "effect" was just some mathematical oddity, that'd be pretty incredible.

And incredible things require incredible evidence. And a blog post rarely, if ever, meets that standard.

goosedragons4y ago

"The second option conforms with the Research Methods 101 rule-of-thumb “always assume independence.” Until proven otherwise, we should assume people have no ability to self-assess their performance"

It's not that at all. The assumption should be that everyone is equally good (or bad) at assessing their performance. Not that they have no ability but that the means between groups is the same vs. not the same. That the ability to assess themselves is independent of performance.

karpierz4y ago

This confused me at first too. The issue is that "X" is your performance, and "Y" is your perceived performance.

Say that everyone is equally okay at assessing themselves, and get within 0.1 of their actual performance (rated from 0 to 1). Then X and Y are going to be very correlated, as X - 0.1 < Y < X + 0.1. But X-Y will look like a random plot, since Y is randomly sampled around X.

The only case where X and Y wouldn't correlate at all is if people have no ability to assess their performance (IE, Y isn't sampled around X, but is instead sampled from a fixed range).

Tyr424y ago

That's exactly the difference this article is driving at!

blamestross4y ago

Possibility 3 backed up by all the same data:

The less you know, the more random your guess at your own knowledge is. The actual value is low and less than zero isn't an option, so this drags the average up consistently.

The more you know, the more accurate your guess of your knowledge is. Especially as you hit the limits of the test, this noise can only drag the average down, but less dramatically than the other case.

With the reasonable conclusion: We all suck at guessing how much we know, but the more you know the less you suck until you hit the limits of the framework you are using for quantization of knowledge.

twobitshifter4y ago

I had the same thought while reading this. The test has a limited range of values, you can only estimate your score within that range, no higher or lower. Those at the top and bottom will naturally estimate into the body of the range since a lower or higher estimate is not possible. However, I’m not sure this explains the results entirely, and I’d like to see a statistician take this further.

majormajor4y ago

Assuming a world where all the participants understand normal distributions, would this be addressed by asking people to rate how they did in terms of "standard deviations compared to the average" or such?

blamestross4y ago

That still wouldn't be useful. The root problem is that a scoring system that isn't infinite both ways (or the reasonably achievable scores are significantly farther from the bounds than the variance of guesses) will end up with a "clipping" at the edges of the model.

There are ways to fix this:

- Throw out the extreme high and low ends of the data bc the model breaks down there. (Which results in a very boring result)

- Have people guess their score and a rough level of confidence along side it (just a 0-5 sort of thing) and see what happens.

Note that I actually do think from my own experience that the effect is real, but the arguments presented fail to prove it statistically bc the model breaks down at the extreme where the effect is detected.

_0ffh4y ago

I also recently though about this problem and came to the same hypothesis, which fits the Dunning-Kruger data perfectly.

zharknado4y ago

Thanks for writing! Really valuable rebuttal imo.

I’m not a statistician but I do have some basic training in psychometrics. It might be interesting/helpful to point out that your priors about self-assessment seem more reasonable generally but also put a lot of faith in the test’s validity as a measure of skill.

I’m relying on intuition here, but it seems a little problematic that the actual score and the predicted score are both bound to the same measurement scheme. Given that constraint on some level we’re not really talking about an external construct of skill, just test performance and whether people estimate it well. Which is different from estimating their skill well.

Maybe someone with more actual skill can elaborate or correct haha.

parentheses4y ago

What’s more interesting to me is what all the buzz over DK tells me. We are asymmetrically skeptical. In the same way as intelligent people doubt their own performance, they rightly doubt others’ performance. Maybe too much.

jasonhansel4y ago

I think that most people who talk a lot about DK believe that they are the experts in one field or another.

It serves mostly as a way of reassuring themselves of their own superiority. The message (for them) basically amounts to "other people's claim to knowledge is just further proof that they don't know anything."

AmericanChopper4y ago

It’s a zero-effort, zero-evidence-required way for people to disparage others, in a way that they believe makes them sound smart. It’s also basically unfalsifiable in most of the cases where it’s referenced.

I feel like I’m honestly yet to see somebody make DK accusations in a way that’s not totally cringe.

rootusrootus4y ago

> I think that most people who talk a lot about DK believe that they are the experts in one field or another.

I recall that either Dunning or Kruger once made a remark to that effect. That rather than an indictment of stupid people, it would be better to view it as a warning to those who consider themselves the smart ones.

LandStander4y ago

I feel like there's a bit of a paradox here. The more I internalize how easy it is for us to be overconfident in our intelligence, the more confident I feel in my intelligence...

IshKebab4y ago

Ha look at this guy saying things. What an idiot. DK effect amiright?

sjmm19894y ago

It's called a giant fucking lack of self awareness, with a good helping of societally instilled narcissism on the worst side of it all, and then add in imposter syndrome, self righteousness and gaslighting. The best side of it all basically is all of these things, but with a tight leash on things and sans the gaslighting. There might be better, but those people are probably off doing their own thing minding their own business; etc.

semanticjudo4y ago

We’ll done. I read the autocorrelation post when it came out a couple weeks back and it didn’t sit right with me. But I didn’t have the motivation to figure out why. Your explanation resonates perfectly with my initial (snap) intuition and I thank you for taking the time to write it out and post!

omnicognate4y ago

Gah, I wish I had time to fully read this and get into it, but I have to spend the next few hours driving.

Unfortunately the original article isn't very clearly explained, and it's only on reading the discussion in the comments under it that it becomes clear what it's actually saying.

The point is about signal & noise. Say your random variable X contains a signal component and a noise component, the former deterministic and the latter random. Say you correlate Y-X against X, and further say you use the same sample of X when computing Y-X as when measuring X. In this case your correlation will include the correlation of a single sample of the noise part of X with its own negation, yielding a spurious negative component that is unrelated to the signal but arises purely from the noise. The problem can be avoided by using a separate sample of X when computing Y-X.

The example in the original "DK is autocorrelation" article is an extreme illustration of this. Here, there is no signal at all and X is pure noise. Since the same sample of X is used a strong negative correlation is observed. The key point though is that if you use a separate sample of X that correlation disappears completely. I don't think people are realising that in the example given the random result X will yield another totally random value if sampled again. It's not a random result per person, it's a random result per testing of a person.

This is only one objection to the DK analysis, but it's a significant one AFAICS. It can be expected that any measurement of "skill" will involve a noise component. If you want to correlate two signals both mixed with the same noise sources you need to construct the experiment such that the noise is sampled separately in the two cases you're correlating.

Of course the extent to which this matters depends on the extent to which the measurement is noisy. Less noise should mean less contribution of this spurious autocorrelation to the overall correlation.

To give another ridiculous, extreme illustration: you could throw a die a thousand times and take each result and write it down twice. You could observe that (of course) the first copy of the value predicts the second copy perfectly. If instead you throw the die twice at each step of the experiment and write those separately sampled values down you will see no such relationship.

andersource4y ago

Hey omnicognate, good to see you here, appreciated our previous discussion.

What you're saying is that we need to verify the statistical reliability of the skill tests DK gave, and to some extent that we need to scrutinize the assumption that there indeed is such a thing as "skill" to be measured in the first place. I hope we can both agree that skill exists. That leaves the test reliability (technical term from statistics, not in the broad sense).

What's simulated by purely random numbers is tests with no reliability whatsoever. Of course if the tests DK gave to subjects don't actually measure anything at all, the DK study is meaningless. If that's what the original article's author is trying to say, they sure do it in a very roundabout way, not mentioning the test reliability at all. I'd be completely fine reading an article examining the reliability of the tests. Otherwise, I again fail to see how the random number analysis has anything to do with the conclusions of DK.

In fact, DK do concern themselves with the test reliability, at least to some extent. That doesn't appear in the graph under scrutiny but appears in the study.

If you assume the tests are reliable, and you also assume that DK are wrong in that people's self-assessment is highly correlated with their performance, and generate random data accordingly, you'll still get no effect even if you sample twice as you propose.

> The key point though is that if you use a separate sample of X that correlation disappears completely

Separate sample of X under the assumption of no dependence at all of the first sample, i.e., assuming there is no such a thing as skill, or assuming completely unreliable tests. So, not interesting assumptions, unless you want to call into question the test reliability, which neither you nor the author are directly doing.

topaz04y ago

I think the other piece that has been glossed over a bit is that DK are using quantiles (for both the test and the self-assessment). That means everything is bounded by 0 and 1, and you can't underestimate your performance if it was poor, or overestimate your performance if it was perfect. Or conversely, if you're the most skilled person in the room, your (random) actual performance on the day of the test is bounded above by your true skill, and vice versa for the least skilled. So e.g. we could simulate data with perfect self-assessment of overall skill, add a small amount of noise to actual performance on the day of the test, and get the same results. The bottom quartile (grouped by actual test score) will be a mix of people who are actually in the bottom quartile in skill and some who are in the higher quartiles. The top quartile by actual test score will be a mix of some from the top quartile in skill and some from lower quartiles.

andersource4y ago

I agree in principle, although I think to get an effect size similar to what DK observed you'd need quite large noise. Which again comes back to the test reliability.

seniortaco4y ago

Beyond the validity of the statistical methods used.. can someone clarify what is the actual hypothesis we are debating about competence? And what does each article propose?

My understanding is that the hypothesis is "Those who are incompetent overestimate themselves, and experts underestimate themselves".

DK says: True

DK is Autocorrelation says: ???

"I cant let go..." says: True?

HN says: also True?

Is there really any debate here? The "DK is Autocorrelation" article seems to be the only odd one out, and it's not clear if it even makes a proposal either way about the DK hypothesis. It talks about the Nuhfer study, but that seems Apples vs Oranges since it buckets by education level. Then it also points out that random noise would also yield the DK effect. But that also does not address the DK hypothesis, and it would indeed be very surprising if people's self evaluation was random!

So should my takeaway here just be that the DK hypothesis is True and that this is all arguing over details?

vetleen4y ago

DK says: True

DK is Autocorrelation says: The DK article is based on a false premise, we got to disregard it

"I cant let go..." says: Actually, given that we assume people are somewhat capable of self-assessment, which is reasonable, "DK is Autocorrelation" is the one based on a false premise, and we should disregard that one instead, and not DK.

formerly_proven4y ago

> My understanding is that the hypothesis is "Those who are incompetent overestimate themselves, and experts underestimate themselves".

The DK hypothesis is "double burden of the incompetent": "Because incompetent people are incompetent, they fail to comprehend their incompetence and therefore overestimate their abilities more than expertes underestimate theirs"

Arguably the hypothesis that matches the data from the DK paper best is: "Everyone thinks they're average regardless of skill level"

dragonwriter4y ago

> The DK hypothesis is "double burden of the incompetent"

The actual DK result (which is much criticized, but that's a different issue) was actually a pretty much linear relationship between actual relative performance and self-estimated relative performance, crossing over at about the 70th percentile.

(Because there is more space below 70 than above, that also means that the very bottom performers overestimated their relative performance more than top performers underestimated, not because of any “double burden” (overstimation didn't rise faster as one moved below the crossover), but just because there was more space below the crossover point.

> Arguably the hypothesis that matches the data from the DK paper best is: "Everyone thinks they're average regardless of skill level"

If there was a perceptual nudge toward average relative performance, you'd expect a crossover at the median with a slope below 1, the nudge is toward a particular point above average.

formerly_proven4y ago

These are their four studies: https://i.imgur.com/nKRjmRb.png https://i.imgur.com/ExmNV0A.png https://i.imgur.com/3k4ILIt.png https://i.imgur.com/CX45O9Y.png In these discussions about DK I've only seen Study 1 referenced (case in point: "look at the graph"), which is "Participants rated each joke on the same 11-point scale used by the comedians".

Sure, there's in aggregate a slight positive slope to self-assessment when plotted against performance. But all of these have in common that the range of self-assessments is small across the full range of performances and they're all centered somewhere around 60.

> The actual DK result

The "incompetent self-assessment because incompetent" claim is literally everywhere in the paper. It's in the title, the abstract, the introduction and every section thereafter until the end.

Tenoke4y ago

>Arguably the hypothesis that matches the data from the DK paper best is: "Everyone thinks they're average regardless of skill level"

No, if you look at the graph[0] everyone thinks they are above average (over 50). The worst think they are a little above average and everyone else thinks they are better and better but increasing by less than the real difference.

At any rate, the issue seems to be with how people imagine everyone performs - they seem to think there are a lot of people who are really bad for a start, and seemingly a bit more people who are really good than there are (at least if we assume the results are accurate).

0. https://andersource.dev/assets/dk-autocorrelation/dk.webp

Terretta4y ago

Read through comments to see if this point was made, thank you.

This is so well understood there’s even a joke about it re: drivers that everyone “gets” even while knowing it doesn’t apply to them.

But to your point, look at the slope in the second chart here, “Histogram of subjective ranks”:

https://gottwurfelt.com/2012/02/28/why-everyone-thinks-theyr...

Compare to DK slope…

hef198984y ago

What I don't like about statistics, or rather the use if them, is the tendency to focus exclusively on them instead of treating them as the tool they are. Statistical analysis is not the subject of the DK effect or paper, it is a tool D & K used in analyzing the effect, nothing else. D&K did put more expertise, research and knowledge into their research than simple statistics.

I hate it when people are "solely2 using statistics, and other first-principle thinking approaches, to understand well researched and documented topics. And I hate it if people use solely statistics to criticize research without considering the other aspects of it. Does it mean the DK effect can be discarded or not? I don't know, I think some disagreement over the statistical methods is not enough to come to any conclusion.

Attacking the Dunning-Kruger study only on statistical grounds looks like aprime example of the DK effect in itself...

john_pryan4y ago

For anyone who is interested in playing around with these charts, the various assumptions that under pin them etc. I've thrown together a colab notebook as a starting point.

Observation: if you rank via true "skill" and assume for a particular instance the predicted performance and observed performance are independent but both have the true skill as their mean you dont observe the effect. CC of 0.00332755.

If you rank via observed performance and plot observed vs predicted the effect is there. CC of -0.38085757.

This is assuming very simple gaussian noise which is not going to be accurate especially as most of these tasks have normalised scores.

Edit: fixed wrong way around

https://colab.research.google.com/drive/1Vy7JjkywxwEP8nfR6oS...

andersource4y ago

Thanks John! Very interesting.

What your simulation includes and the original article didn't (and I didn't touch at all in my article) is the statistical reliability of the tests they administered. Where you got a CC of -0.38 you used equal reliability (/ unreliability) of the skill tests and self-assessments. You can see that as you increase the test reliability, the CC shrinks and the effect disappears.

I have no idea what's the actual reliability of the DK tests, they do seem to consider that but maybe not thoroughly enough. In my view it's very fair to criticize DK from that angle. But that would require looking at the actual tests and their data.

My point being, that any purely random analysis is based on assumptions that can easily be tweaked to show the same effect, the opposite effect, or no effect at all.

john_pryan4y ago

That's a nice spot about the decreasing CC as we increase accuracy!

My hypothesis would be that some of the DK effect in the original paper may be down to an effect like this (as suggested in the original article) but that asserting it is completely incorrect because of it is premature. We'd need access to more data to verify that the level of reliability was sufficiently acceptable.

andersource4y ago

Right. Just to be clear, "an effect like this" is (comparatively) unreliable tests, not some elusive statistical phenomena as implied by the original article. I'd have no issue if the author had called the article "the DK effect is due to poor skill tests", spent 5 minutes showing that the DK results are consistent not only with their claims but also with unreliable tests (like you did), then went on to show data that indicates that the tests indeed are not reliable enough to draw the conclusions that DK did. Instead the author spends a lot of time digging under the wrong tree and no time at all saying anything about the reliability of the tests.

1 more reply

kybernetikos4y ago

Would it be possible to understand the results differently? It looks to me that the data could be explained by the participants moderating their self assessment away from extremes or perhaps towards the population mean which is arguably not an unreasonable thing to do if your knowledge of the population mean is better than your knowledge of your own performance.

sandgiant4y ago

And this is why we need error bars on all plots. Looking at these plots there is no way to know whether people guessed uniformly or whether the self assessment is clustered around the mean.

IshKebab4y ago

Yeah I agree that's the likely explanation. Nobody wants to admit that they're terrible and nobody wants to boast that they're the best and be proven wrong.

So my suspicion is that the DK effect is not really a symptom of people's inability to accurately self-assess, but they're unwillingness to accurately report that self-assessment.

And I don't think it is unique to self assessment either. It's common knowledge that ratings on a scale out of 10 for pretty much everything are nearly always between 6 and 9.

I don't know how they did the experiment but I bet they'd get different results if the self-assessments were anonymous and accuracy came with a big financial reward.

Anyway that's all irrelevant to the point of the article which I think is correct.

nosefrog4y ago

I think the main point of this post is correct -- just because you can find the effect in random noise, doesn't mean it's not real phenomenon that happens in real life. But it's missing a nuance there: if an effect can be replicated with random noise, then it's not a psychological effect (e.g. something that you would explain as a human bias), but a statistical effect. E.g. regression towards the mean is a real effect, but it's a statistical effect, not a psychological effect.

And that's the point the original article was trying to make ("The reason turns out to be embarrassingly simple: the Dunning-Kruger effect has nothing to do with human psychology. It is a statistical artifact — a stunning example of autocorrelation."), though that point does lost a bit as it goes on.

I think this article gives a better summary of how the Dunning-Kruger effect probably isn't a psychological effect: https://www.mcgill.ca/oss/article/critical-thinking/dunning-...

civilized4y ago

> if an effect can be replicated with random noise, then it's not a psychological effect

This isn't true either. Statistical dependence does not determine or uniquely identify causal interpretation or system structure. See Judea Pearl's works (e.g. The Book of Why) for more on this.

People lacking the ability to self-assess is interesting psychologically. People can learn from experience in many other contexts. People can judge their relative position versus other people in many contexts. Why would they be so bad at this particular task? There could be a psychological underpinning.

Even if it turns out we have useless noise-emitting fluff in the place that would produce self-awareness of skill, that would be a psychological cause of a psychological effect. Not the ones that Dunning and Kruger believed they were seeing, but still.

Now, if you asked frogs for a self-assessment of skill, I would expect that data would not show any psychological effects.

bmacho4y ago

> People lacking the ability to self-assess is interesting psychologically. People can learn from experience in many other contexts. People can judge their relative position versus other people in many contexts. Why would they be so bad at this particular task? There could be a psychological underpinning.

Is there an existing and relevant phenomenon about people lacking the ability to self-asses, that is true, proven, and not just trivia?

I do believe that people understand all the available information about their skills and performance, and they rate themselves according to it.

E.g. if they are asked about whether they perform good on an IQ test against their classmates they will produce noise (see E.g. the article "I can't let go of.."), and if they have the results of the IQ test, they will be able correctly calculate in which quartile they are.

Is there anything against this view?

civilized4y ago

I have no idea. I think it would be a fascinating experiment to take people who have never taken an IQ test, ask them how they think they'll do, then compare that against their actual performance.

nosefrog4y ago

Sure, it's an interesting question, but if the way you're measuring it can't be distinguished from noise, you need to find a different measure.

civilized4y ago

"Noise" isn't a generic concept that can be universally applied in the same way in any scientific context. That's the point of the article. It doesn't make much sense to assume that people's self-assessments come from an internal random number generator.

It makes sense as a robotically developed null hypothesis, but it doesn't make sense in the real world.

anonymoushn4y ago

To riff on one of the author's previous comments, if height was uncorrelated with age for 0-20 year olds, that would be very surprising, and hopefully we wouldn't need to make posts saying "the fact 20 year olds are just as likely to be 1 ft tall as 1 year olds is not a physical phenomenon, it's a statistical effect."

randcraw4y ago

As a novice on DK, it seems to me that, for DK to be 'suprising' (in the parlance of the OP), four phenomena must hold:

1) an incompetent person is poorer than average at self assessment of their skill

2) as a person's competence increases at a skill, their ability to self-assess improves, until they become 'expert' which is defined by underappreciating their own skill (or overappreciating the skill of others)

3) DK is surprising (interesting) only when some incompetent persons who suffer from DK cannot improve their performance, presumably because their poor self-assessment prevents their learning from experience or from others.

4) Worse yet, some persons suffering from DK cannot improve their performance in numerous skill areas, presumably because their poor self-assessment is caused by a broad cognitive deficit (e.g. political bias), preventing them from improving on multiple fronts (which are probably related in some thematic way).

If DK is selective to include only one or two skill areas, as in case 3, that is not especially surprising, since most of us have skill deficits that we never surmount (e.g. bad at math, bad at drawing, etc). DK becomes surprising only in case 4, when we claim there is a select group of persons who have broad learning deficits, presumably rooted in poor assessment of self AND others — to wit, they cannot recognize the difference between good performance and bad, in themselves or others. Presumably they prefer delusion (possibly rooted in politics or gangsterism) to their acknowledgement of enumerable and measurable characteristics that separate superior from inferior performance, and that reflect hard work leading to the mastery of subtle technique.

If case 4 is what makes DK surprising, then DK certainly is not described well by the label 'autocorrelation' — which seems only to describe the growth process of a caterpillar as it matures into a butterfly.

bryanrasmussen4y ago

>it seems to me that, for DK to be 'surprising' (in the parlance of the OP), four phenomena must hold:

The surprising things about DK, to me at any rate, is how unvarying it is in application. Under DK people who are poor at something never think wow I really suck at this, or if they do they are such a minuscule part of the population that we can discount them.

I've known lots of people who were not good at particular things and did not rate themselves as competent at it, although truth is they might have claimed competence if asked by someone they didn't want to be honest with.

taeric4y ago

There is an easy reframing that works well. People that are "poor" at something don't know enough to know just how good someone can be at something.

And this tracks for most skills. How good are you at tying your shoes? Probably average? Just how good can you get? Probably not that much better, all told. It is a clearly defined goal and likely has a limit on the skill you can build.

What about writing your name? Putting on your clothes? Making your bed? All things that are somewhat bound in just how good you can be.

Now, throw in something like "play the piano." Turns out, the expertise bar is much much higher suddenly. But, it you haven't been trying, how would you know?

CRConrad4y ago

Obligatory Shoelace Site page: https://www.fieggen.com/shoelace/knots.htm

(My recommendations are: When you really need your laces to hold, the Mega Ian Shoelace Knot - https://www.fieggen.com/shoelace/megaianknot.htm ; and for most shoes and definitely all laced boots (and absolutely for ice skates!), the Over Under Lacing: https://www.fieggen.com/shoelace/overunderlacing.htm .)

bryanrasmussen4y ago

If someone asked me how bad are you at playing the piano, having never tried, I would say I was totally incompetent which I take to mean the worst possible. According to DK I should somehow be worse than that.

1 more reply

seoaeu4y ago

> And this tracks for most skills. How good are you at tying your shoes? Probably average?

In fact, this is pretty much what the Dunning-Kruger graphs look like. The article shows the one for humor which has the bottom quartile participants answer "eh, about average" while the top quarter of participants realize they're better than average, but estimate roughly 75-percentile rather than 87.5-percentile.

watwut4y ago

That is because DK does not day "incompetent people see themselves as pros". It says "they overestimate their abilities". They rate themselves low, but in fact their competence is even lower.

watwut4y ago

What does "suffering from DK" even means?

CRConrad4y ago

Exactly. One would enjoy this blissful ignorance of one's incompetence, I'd think.

haberman4y ago

On a pure human level, a large portion of DK discourse seems to be a fight over which people are the "Unskilled and Unaware." Or more bluntly, who gets to call who stupid.

The author says as much in this article:

> Why so angry? [...] [Frankly], for the last few years, the world seems to be accelerating the rate at which it’s going crazy, and it feels to me a lot of that is related to people’s distrust in science (and statistics in particular). Something about the way the author conveniently swapped “purely random” with “null hypothesis” (when it’s inappropriate!) and happily went on to call the authors “unskilled and unaware of it”, and about the ease with which people jumped on to the “lies, damned lies, statistics” wagon but were very stubborn about getting off, got to me. Deeply. I couldn’t let this go.

It's true, the previous article (https://economicsfromthetopdown.com/2022/04/08/the-dunning-k...) was pretty harsh on the authors of the original paper:

> In their seminal paper, Dunning and Kruger are the ones broadcasting their (statistical) incompetence by conflating autocorrelation for a psychological effect. In this light, the paper’s title may still be appropriate. It’s just that it was the authors (not the test subjects) who were ‘unskilled and unaware of it’.

But on some level, the original paper sounds just as condescending and dismissive. It presents a scholarly and statistical framework for looking down on "the incompetent" (a phrase used four times in the original paper). In practice, most of the times I see the DK effect cited, it functions as a highbrow and socially acceptable way of calling someone else stupid, in not so many words.

Cards on the table, I've never liked DK discourse for this reason. It's always easy to imagine others as the "Unskilled and Unaware", and for this reason bringing DK into any discussion rarely generates much insight.

jasonhansel4y ago

> it functions as a highbrow and socially acceptable way of calling someone else stupid

I think it's even worse that that: it's also a socially acceptable way of enforcing credentialism and looking down on others for not having a sufficiently elite education.

insaider4y ago

So true, the whole DK discourse is very rarely constructive. Except on HN of course ;)

lamontcg4y ago

I'm not a practicing statistician, so I'm uncertain how to weigh the two arguments here.

kurthr4y ago

I see what you did and I am completely uninformed in my certainty that Dunning-Kruger is wrong!

bombcar4y ago

I fall back to Sturgeon’s Law and assume 90% of everything, including me, is shit.

lamontcg4y ago

Even though I lack a medical degree I have a high degree of confidence that the intestines of most people are not sufficiently large enough to support that amount of faeces.

1 more reply

rzzzt4y ago

This sounds like an extreme illustration of the Baader-Meinhof phenomenon.

torginus4y ago

When I saw the graphs in the original article I immediately came to a different conclusion - that people with a given amount of skill have low confidence in their ability to gauge how skilled they are compared to an arbitrary group.

For example, if someone gave me (or you) a leetcode-style test, and told me I'd be competing against a sample picked from the general population, and ask me how well I did, I'd probably rate myself near the top with high confidence.

Conversely, if my competitors were skilled competitive coders, I'd put myself near the bottom, again with high confidence.

Now, if I had to compete with a different group, say my college classmates, or fellow engineers from a different department, I'd be in trouble, if I scored high, what does that mean? Maybe others scored even higher. Or if I couldn't solve half of the problems, maybe others could solve even less - point is I don't know.

In that case the reasonable approach for me would be to assume I'm in the 50th percentile, then adjust it a bit based on my feelings - which is basically what happened in this scenario, and would produce the exact same graph if everyone behaved like that.

No need to tell tall tales of humble prodigies and boastful incompetents.

jldugger4y ago

> Again, my main point is that there’s nothing inherently flawed with the analysis and plots presented in the original paper.

I find the use of quartiles suspicious, personally. It's very nearly the ecological fallacy[1].

> I’m not going to start reviewing and comparing signal-to-noise ratios in Dunning-Kruger replications

DK has been under fire for a while now, nearly as long as the paper has existed[2]. At present, I am in the "effect may be real but is not well supported by the original paper" camp. If DK wanted to they could release the original data, or otherwise encourage a replication.

[1]: https://en.wikipedia.org/wiki/Ecological_correlation [2]: https://replicationindex.com/2020/09/13/the-dunning-kruger-e...

geysersam4y ago

Agree. From the DK article graph it is not possible to separate the cases

1. Average self assessment coincides with true skill, but variance increases with low skill.

2. Average self assessment is biased, and the bias is positive when you are unskilled and negative when you're highly skilled.

These two situations would create indistinguishable DK-graphs. I don't understand how anyone can be sure on either (1) or (2) after seeing one instance of such a graph.

As I see it, the only way out for "DK positivists" is to say that the DK hypothesis is unrelated to the truth values of (1) and (2). Or, that there is other evidence making DK convincing.

Neither seems very plausible!

closed4y ago

FWIW extreme groups (e.g. using upper and lower quartiles) is well understood in its inflation of effect size (there are even formulas to correct this, given an extreme groups design).

It's definitely related to ecological fallacy in the sense that both underestimate relative error and inflate effect sizes.

_carbyau_4y ago

Do they have to encourage replication?

If others can't replicate it entirely on their own without "encouragement". Then it isn't useful at all and the original experiment can be safely ignored as irrelevant to humanity, along with any "prestige" associated with it.

1 more reply

ComradePhil4y ago

If you measure competence as relative performance, a person cannot know how competent they are compared to others... because to do that correctly, they would not only have to know how much they know but also know how much other people know... preferably in relation to them.

This is not possible, so the self-assessment data will be random because it is a random guess... so it does not correlate to actual performance or anything else for that matter. Hence, DK effect has to be a result of faulty statistical analysis.

I believe we'd have completely different results if the question was framed differently: "how many do you believe you got right?". Then, more confident people, regardless of competence, would answer that they got more right and less confident people, again regardless of competence, would believe that they must have gotten more wrong than they did.

irrational4y ago

> If you tell me you didn’t have a single serious thought of self-assessing today, even semi-conscious, I simply won’t believe you.

I stopped reading at this point. Someone that is so certain that they say “I simply won’t believe you.” is too self-assured to be worth paying much attention to.

Scea914y ago

If you tell me that Earth is flat I simply won't believe you.

irrational4y ago

That is factual data. Saying you won’t believe that a person is or is not thinking about something is entirely different.

Scea914y ago

> I stopped reading at this point. Someone that is so certain that they say “I simply won’t believe you.” is too self-assured to be worth paying much attention to.

Actually it is even more ironic. You are too self-assured that a multi-page article is not worth paying attention to because of a single sentence in it that irritates you.

LudwigNagasena4y ago

The author seems to go completely astray at some point.

> “Never assume dependence” gets so ingrained that people stubbornly hold on to the argument in the face of all the common sense I can conjure. If you still disagree that assuming dependence makes more sense in this case, I guess our worldviews are so different we can’t really have a meaningful discussion.

Hypothesis testing is concerned with minimization of Type I and Type II errors. In the Neyman-Pearson framework this calls for specific choice of the null hypothesis. Of course nothing prevents you to define the sets for H0 and H1 as arbitrarily as you want as long as you can mathematically justify your results.

It seems like the author fundamentally misunderstands the basics of statistics.

dahart4y ago

One of the best commentaries on DK is Tal Yarkoni’s, and he came to the (perhaps similar?) conclusion that DK is probably regression to the mean. https://www.talyarkoni.org/blog/2010/07/07/what-the-dunning-...

It bugs me that DK reached popular consciousness and get misinterpreted and misused more often than not. For one, the paper shows a positive correlation between confidence and skill. The paper is very clearly leading the reader, starting with the title. The biggest problem with the paper is not the methodology nor the statistics, it’s that the waxy prose comes to a conclusion that isn’t directly supported by their own data. People who are unskilled and unaware of it is not the only explanation for what they measured, nor is that even particularly likely, since they didn’t actually test anyone who’s verifiably or even suspected to be incompetent. They tested only Cornell undergrads volunteering for extra credit.

denton-scratch4y ago

If DK is regression to the mean (a view I find convincing) that doesn't mean the effect isn't real; i.e. one would still observe that people of low ability overestimate their ability, simply because there is more "room" for overestimates than underestimates. And v.v.

Put differently, if everyone's estimate was exactly the mean, you'd still see a "DK effect".

dahart4y ago

I’m not sure I understand. If the effect shown in the paper is regression to the mean, then that does mean the paper doesn’t actually demonstrate what it claims to, right? I mean you can argue that the idea is still plausible, but this would mean that the paper doesn’t support the claim that low skill people overestimate themselves, right?

It’s also an interpretation to focus on unskilled people as the explanation. DK’s data shows the very same effect on highly skilled people. The people in the top quartile were just as bad at self-estimating as the bottom quartile, yet the paper claims only the unskilled people were unaware!

I recommend reading the DK paper. It didn’t test any people of low ability, and it did not evaluate skill in absolute terms. The sample size was tiny. The kids who participated were all earning extra credit in a class (it’s a self-selecting population that might have excluded both A students and F students.) The students were all Ivy League undergrads who might all overestimate their abilities precisely because they’re in a prestigious school and their parents told them they’re great. The paper didn’t test any actual low IQ population. The paper has methodology problems when it comes to non-native English speakers.

It absolutely blows my mind that the paper is held up as evidence for some kind of universal human trait with such miniscule and completely questionable evidence. I have no doubt that some people overestimate their abilities in some situations. Like you, I’m sure, I’ve witnessed that. But as a commentary on all of humanity, I’m becoming convinced that the so-called DK effect does not exist, that they didn’t show what they claim to show. It doesn’t help that many replication attempts have not only failed to replicate, but have ended up showing the opposite effect: that for many kinds of skilled activities, people.

jcranberry4y ago

I don't really understand the article. My understanding was that the mistake was that the error bounds differ depending on the test score from the original DK paper. A test score of 0 or 100 means a potential error of 0-100, whereas a test score of 50 means a potential error of 50. So if you take a group of people who score 0-25 points, if their self-assessment is completely random you'd still see a bias of overestimating score,because people who would give themselves a lower score if possible are unable to.

oh_my_goodness4y ago

The charts make it clear that people's self-assessment was (roughly) independent of their skill level. It's not obvious that students' self-assessment would be mostly random / unrelated to skill level. For me that's a non-obvious result.

If people wander off through the verbiage of any article, where the chatter isn't supported by data, sure, they'll tend to get speculation.

jcranberry4y ago

I don't really understand what you're saying. Are you the saying the charts don't actually make it clear, or that they make it clear that self assessment is independent but not necessarily uniform?

oh_my_goodness4y ago

The charts make it clear that self-assessment was roughly unrelated to ability. That's not an artifact of autocorrelation, instead it appears to be an experimental result.

rafaeltorres4y ago

Yeah, this makes sense to me.

Imagine in the Dunning-Kruger chart the second plot (perceived ability) was a horizontal line at 70, which is not true but not far off from the real results. Now imagine I told you "did you know that, regardless of their actual score, everyone thought they got a 70?" That's a surprising fact.

topaz04y ago

I think the most egregious thing about the original presentation is that it leads you to believe that people with a given skill level all self-assessed similarly. If you plotted the scores and self-assessments of each individual you would see that it's not "everyone [in the first quartile] thought [they were about average]", it's that their self-assessments varied wildly, from low and accurate to high and inaccurate.

clwk4y ago

"Most believe themselves 'above average' at most things."

mewse4y ago

Most people have an above-average number of legs.

There's really no contradiction there; all it takes is for there to be a couple low scores pulling the average down.

dragonwriter4y ago

> Most people have an above-average number of legs.

The arithmetic mean and the median are both averages, but the upthread comment was about the median and yours about the arithmetic mean.

> There's really no contradiction there; all it takes is for there to be a couple low scores pulling the average down.

Well, no, when what you are estimating is relative performance by score percentiles, and people's self evaluation is biased toward the 70th percentile, that's not what is happening.

LaurieKoudstaal4y ago

In the context of the paper, we should be talking about the median, not the mean.

aaaronic4y ago

It seems like the people who want to disprove Dunning-Kruger are falling victim to it.

I honestly think people take it way too seriously and apply it too generally. Quantifying "good" is hard if you don't know much about the field you're quantifying. Getting deep into a particular field is humbling -- Tetris seems relatively simple, but there are people who could fill a book with things _I_ don't know about it, despite playing at least a few hundred hours of it.

Is there an answer to that humility gained by being an expert in one field being translated to better self-assessment in other fields? I feel myself further appreciating the depth and complexity of fields I "wrote off" as trivial and uninteresting when I was younger as I get deeper into my own field (and see just how much deeper it is too).

jasonhansel4y ago

> Is there an answer to that humility gained by being an expert in one field being translated to better self-assessment in other fields?

I think that often the opposite is true: people who become experts in one domain often assume that they are automatically experts in completely unrelated fields. I suspect that this is the cause of "Nobel disease": https://en.wikipedia.org/wiki/Nobel_disease

lamontcg4y ago

https://xkcd.com/793/

"there's nothing more annoying than a physicist encountering a new subject"

robocat4y ago

Or perhaps your comment is the relatively rare meta-meta-DK effect.

ncmncm4y ago

"Most citations of D-K are examples of it."

8note4y ago

The open question this raises to me is why a DK=true set of data would show up with the same graph as a uniformly random set

What I'm really missing is a plot of the data without the aggregation. I find it very strange that X is broken down into quartiles but Y isn't, and when in quartiles, people estimated their skills relative to each other quite well: the line still goes up, and from bottom to top, would be a perfect X to X corelation

obastani4y ago

Uniformly random data means that someone’s perception of their ability is uncorrelated with their actual ability, which is exactly what DK=true is saying!

tpoacher4y ago

Great article. Very nicely written.

In partial "defense" of the "autocorrelation" article, the author was in fact arguing against their own perceived definition of DK, not what most people consider to be DK. They just didn't realise it.

Which is an all too common thing to begin with. (that particular article pulled the same stunt with the definition of the word 'autocorrelation', after all).

sumanthvepa4y ago

I read about DK and I was absolutely convinced that the effect was real. Then I read the article about DK being mere autocorrelation and I came away absolutely convinced that DK was bullshit. Then I read this article and I'm absolutely convinced that the 'DK is autocorrelation' hypothesis is utter BS. Sigh. There are lies, damned lies and statistics... :-)

jasonhong4y ago

Consider taking a more Bayesian view of the world, especially with scientific papers. I informally tell the students I work with to look for a constellation of papers that offer supporting evidence from multiple perspectives.

stagas4y ago

Me too. I believe the effect contains a logical recursion that is impossible to escape from. Maybe the randomness variable in it? It looks as if all validations and refutations of it are always going to appear logical. I don't know what to call it or compare it with but it feels this must be documented as being a prime example of its category.

weird-eye-issue4y ago

Sounds like DK in effect

djs0704y ago

Parent comment made no evaluation of their own ability, so I don’t see how it’s DK in effect at all.

tunesmith4y ago

In the sense that people shouldn't let themselves be convinced by arguments they don't fully understand, that seems somewhat related to DK, in that people shouldn't be believing they are more competent than they are.

(That's not to criticize OP - when someone makes an argument that sounds convincing, it can be pretty convincing! It's just different than actually being valid.)

weird-eye-issue4y ago

They immediately reached a conclusion about something upon reading about it and now as they learned more, they understand that there might be more nuance to it

nokya4y ago

Thank you infinitely for taking the time to respond.

I don't have this luxury in my life right now but I admit after reading the "original" post almost a fourth time, I was really hoping someone would take the time to explain why/how the author could be completely wrong (or not).

Thanks.

emsign4y ago

Sounds like the premise is flawed. He's assuming kids are good at getting another 10 minutes before bedtime. All of them? What about those who fail? Those that don't even try?

The issue is not the way our brains generalize, but that you are using just one brain, one life's experience.

t_mann4y ago

It can give us an indication of how the growth rate depends on size

Except that what you've plotted there isn't the growth rate, but the absolute growth. Your argument for DK isn't convincing either, they claimed sth much stronger than that we can't assess our own skills.

wodenokoto4y ago

This is a follow up/reaction to an article that hit the front page a few days ago. Might be worth to check out the discussion there as well:

https://news.ycombinator.com/item?id=31036800

brodouevencode4y ago

Question for you folks that are smarter than me (see what I did there?) - DK has surfaced a lot here and in the online world more broadly with seemingly increased frequency. Why do you think that is?

gverrilla4y ago

Science is in deep crisis. It's only utility today is supporting industry and some public infrastructure. Social sciences are a scam, being economics the greatest racket amongst them all.

andi9994y ago

The relative DK effect not to exist would require clairevoyance from the participants. The non relative dk effect is more interesting.

marcholagao4y ago

tldr; D+K's experiment was: Assign the numbers 1 thru 10 to ten people. Have each role a 10 sided die. The person assigned a 1 will roll higher than his assigned number 90% of the time.

Daniel: >It’s not a “statistical artifact” - that will be your everyday experience living in such a world.

You can experience statistical effects. I think a lot of controversy comes from how Dunning and Kruger's paper leads people to interpret the data as hubris on the part of low-performers, and the statistical analysis demolishes that interpretation. Not knowing how well you performed is not the same thing psychologically as "overestimating" your performance.

jameshart4y ago

This is such a bizarre argument.

Dunning Krueger is precisely about the surprising result that people are bad at estimating their performance!

If you accept the 'D-K is autocorrelation' argument, you don't get to throw out the existence of the D-K effect: you are saying Dunning + Krueger failed to show that humans have any ability to estimate how skilled they are at all.

That seems like an even more radical position than the D-K thesis.

slavik814y ago

> Dunning Krueger is precisely about the surprising result that people are bad at estimating their performance

Isn't DK about estimating your performance relative to the rest of the population? To do that, you need to not only know your own performance but also everyone else's. To me, guessing the performance of others sounds quite difficult.

notahacker4y ago

The implicit hypothesis is that if a test is full of questions you have no idea how to answer, you really ought to have strong priors that you're a below average performer. Tests are generally designed so that people familiar with the relevant material and methodologies can attempt answers; you dont need to know exactly how good other test takers are for it to be reasonable to assume you're in the bottom quartile if you can't. Same as you should have a lot less difficulty than most cyclists estimating whether your time trial was a good one relative to the rest of the field if you struggled to stay on the bike.

Of course, there are tests where the bottom quartile find the majority of it easy and have no particular reason to assume that most others found it even easier, and circumstances in which the weak undergrad who can only answer half the questions may reasonably believe that the test is being administered to a general population full of people who won't understand any of the material at all. But in general, it's reasonable to assume that if there's a lot of stuff you don't know, other people will know better.

anonymoushn4y ago

The claim that skill does not exist or that people are totally unable to recognize how good they are at anything is quite radical.

You are sort of smuggling in the assumption for example that Olympian medalist lifters, when asked how much they can deadlift, will have the same distribution of answers as people who never deadlift (but are aware that totally sedentary men can probably deadlift like 200lbs and totally sedentary women can probably deadlift like 150lbs). If this were true, it would be worth publishing a paper about it.

It's sort of surprising to me to read your comment because TFA is an extended rebuttal of your comment.

> I think a lot of controversy comes from how Dunning and Kruger's paper leads people to interpret the data as hubris on the part of low-performers, and the statistical analysis demolishes that interpretation. Not knowing how well you performed is not the same thing psychologically as "overestimating" your performance.

D-K actually found that low performers were less accurate at assessing their skill than high performers, and the article you refer to obviously did not find this effect in random data, so I'm not sure how it was demolished.

NumberCruncher4y ago

> We don’t need statistics to learn about the world.

A sentence, written by the author on, commented by me on and read by the HN community on devices, which exist only thanks to 80-90 years of rigorous, statistics based QA in engineering, especially in mechanical/hardware engineering.

Anyhow, after spending years on a team filled with social science PHDs, I would not waste my time on reading papers about statistical analysis done by social scientist.

Traster4y ago

I don't think your interpreting the sentence correctly, I think it's more correct to read it as:

> We don't [only] need [the formal discipline of mathematics known as] statistics to learn about the world.

Sure, there are things you can only functionally ascertain through statistical analysis. But not everything in the world needs rigorous statistics.

snovv_crash4y ago

We don't need != We don't only need.

First one means "throw it away". Second one means "add other things too".

NumberCruncher4y ago

> I don't think your interpreting the sentence correctly

And I think you are injecting the words "only", "there are" and "everything" here and there just to change the meaning of the sentences I quoted and I have written...

TimPC4y ago

I feel like the author read the autocorrelation result, hated it and ignored the central point. There are ways to bucket data that removes the autocorrelation and in those experiments we also see the DK effect disappear. Trying to argue that we should study the effect with the autocorrelation present but ignore the autocorrelation for 'reasons' is not the way forward.

longtimegoogler4y ago

Well argued and I agree completely.

obastani4y ago

I feel like this article is severely over-complicating the analysis. Looking at the original blog post [1], their key claim appears to be that "random data produces the same curves as the DK effect, so the DK effect is a statistical artifact".

However, by "random data", the original blog means people and their self-assessments are completely independent! In fact, this is exactly what the DK effect is saying -- people are bad at self-evaluating [2]. (More precisely, poor performers overestimate their ability and high performers underestimate their ability.) In other words, the premise of the original blog post [1] is exactly the conclusion of DK!

Looking at the HN comments cited [3] by the current blog post, it appears that the main point of contention from other commenters was whether the DK effect means uncorrelated self-assessment or inversely correlated self-assessment. The DK data only supports the former, not the latter. I haven't looked at the original paper, but according to Wikipedia [2], the only claim being made appears to be the "uncorrelated" claim. (In fact, it is even weaker, since there is a slight positive correlation between performance and self-assessment.)

So, my conclusion would be that DK holds, but it does depend on exactly what is the exact claim in the original DK paper.

[1] https://economicsfromthetopdown.com/2022/04/08/the-dunning-k...

[2] https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

[3] https://news.ycombinator.com/item?id=31036800

saurik4y ago

> I haven't looked at the original paper, but according to Wikipedia [2], the only claim being made appears to be the "uncorrelated" claim.

Is it that hard to actually check the original paper before bothering to make such a claim? The original paper explicitly claims to examine "why people tend to hold overly optimistic and miscalibrated views about themselves".

hgomersall4y ago

Yeah, the model is a simple linear model (which I've yet to see written down) with some correlation coefficient which is the unknown. Derive an estimator for that correlation coefficient, being explicit about the assumptions, then we can have a discussion. Until then it's all lots of noise. The raw data would help too.

devit4y ago

The "The Dunning-Kruger Effect is Autocorrelation" article is an example of obvious bullshit.

Their claim that "If we have been working with random numbers, how could we possibly have replicated the Dunning-Kruger effect?" is the first blatantly false statement, and then the rest is built upon that so it can be safely disregarded.

It's easy to see this because while the effect is present if everyone evaluates themselves randomly, it's not present if everyone accurately evaluates themselves, and these are both clearly possible states of the world a priori, so it's a testable hypothesis about the real world, contrary to the bizarre claim in the paper.

Also, the knowledge that the authors published that article provides evidence for the Dunning-Kruger effect being stronger than one would otherwise believe.

ta1234578644y ago

Your comment amounts to saying that some of the randomly generated data really is consistently over estimating it's performance. How absurd.

Like similar analyses here you don't factor in that DK is about bias. Of course you can't see bias when test score=self assessment. That's because "IF everyone perfectly knows their score then there is no bias in their assessment" is a tautology.

soVeryTired4y ago

That original article was bogus and needlessly combative. I feel like the majority view in the HN comments saw it as such.

Most comments were splitting hairs on what _exactly_ the Dunning-Kruger effect was, plus some general nerd-sniping on how the original article was off base.

IMO it was something that fell flat on its own rather than something that needed a lengthy refutation, but I can understand that sometimes these things get under your skin.

prvc4y ago

Just based on the graph just under the "The Dunning-Kruger Effect" section, one observation I'd like to present is that the subjects's numerical self-assessments fall into the same range as passing but non-stellar grades do in school. This may reflect a psychological bias in how the subjects use and understand percentages. Accordingly, that the two lines cross is a red herring.

ipnon4y ago

It would be quite ironic if Dunning-Kruger opponents were arguing against its statistical validity with faulty statistical reasoning.

murrayb4y ago

The corollary of Dunning Kruger is that everyone is equally capable and equally capable of assessing their performance. This nicely suits the current social rhetoric but does not match observed reality.

Edit-see below I meant opposite not corollary.

smw4y ago

Isn't that the exact opposite of D-K?

murrayb4y ago

Doh corollary doesn't mean what I thought. I learned something thank you. I meant the opposite.

taeric4y ago

Perhaps you meant contrary?

1 more reply

keshetOP4y ago

Somebody is still wrong on the internet

1 more reply

photochemsyn4y ago

Any discussion of statistics-based reasoning should include the concept of systematic bias, and that's not mentioned in this article at all. An example of systematic bias is that of an accurate but miscalibrated thermometer, where the spread of measurements at fixed temperature is small, but all measurements are off by some large factor.

Now with D-K the proposed problem is statistical autocorrelation, not systematic bias, due to lack of independence, as here:

> "Subtracting y – x seems fine, until we realize that we’re supposed to interpret this difference as a function of the horizontal axis. But the horizontal axis plots test score x. So we are (implicitly) asked to compare y – x to x"

Regardless, it's fairly obvious that D-K enthusiasts are of the opinion that a small group of expert technocrats should be trusted with all the important decisions, as the bulk of humanity doesn't know what's good for it. This is a fairly paternalistic and condescending notion (rather on full display during the Covid pandemic as well). Backing up this opinion with 'scientific studies' is the name of the game, right?

It does vaguely remind me of the whole Bell Curve controversy of years past... in that case, systematic bias was more of an issue:

> "The last time I checked, both the Protestants and the Catholics in Northern Ireland were white. And yet the Catholics, with their legacy of discrimination, grade out about 15 points lower on I.Q. tests. There are many similar examples."

https://www.nytimes.com/1994/10/26/opinion/in-america-throwi...

I am reminded of something my very accomplished PI (in the field of earth system science) confided privately to me once... "Purely statistical arguments," she said, "are mostly bullshit..."

anonymoushn4y ago

> Regardless, it's fairly obvious that D-K enthusiasts are of the opinion that a small group of expert technocrats should be trusted with all the important decisions

It seems like you're roughly the only person who thinks this.

j / k navigate · click thread line to collapse

285 comments

quanto4y ago

The conclusion in the article:

That said, I give the benefit of doubt to the author of "The DK Effect is Autocorrelation." It is a human error to be overly zealous in some opinions without thinking it through.

darawk4y ago

1. https://journals.plos.org/plosmedicine/article?id=10.1371/jo...

russdill4y ago

On the other end there's distrust of broad scientific consensus across different professions, countries, etc. It's the distrust at these levels that is the increasing problem we are facing today.

lelanthran4y ago

> I think there's two extremes here. One is the issue covered well above. There is a great deal of junk science that gets published.

It's more than that, I think. Sibling-thread poster hit the nail on the head when he complained of politicised science.

The social sciences have this dominating and silencing effect on the rest of the sciences.

There's always been junk science, and when found out it gets discredited. This is still happening and is a good thing.

[1] See the argument in yesterdays threads about what "man" and "woman" mean, and should dictionaries be changed, etc.

2 more replies

tsimionescu4y ago

> People rightly so have distrust for results coming out in fields they don't have good knowledge in. Without becoming an expert yourself it's very difficult to know who or what to trust.

1 more reply

mike_hearn4y ago

The reasonable middle ground is, or probably should be, closer to the latter end than the former end unfortunately.

Put simply, science is advertised as self-correcting but in reality it's not. Representative experience documented here: http://crystalprisonzone.blogspot.com/2021/01/i-tried-to-rep...

1 more reply

Lutger4y ago

roenxi4y ago

> Without becoming an expert yourself it's very difficult to know who or what to trust.

We shouldn't trust that skyscrapers stay up because engineers are trustworthy. They stay up because the engineer goes down with the building.

quanto4y ago

j3th9n4y ago

Scientific facts do not exist, scientific observations do.

bigfudge4y ago

We should really just do (and certainly publish) less research.

1 more reply

rkangel4y ago

bigfudge4y ago

Hendrikto4y ago

If only journals actually wanted to contribute… They just want to skim.

starfallg4y ago

>The problem isn't really people being anti-science or pro-science. The problem is science being done poorly, whether by scientists in the credentialed sense, or amateurs.

That's a very simplistic take on it. Bad science is a necessary part of the process and dealt with accordingly by the scientific method.

Simply put, there is not much profit in reporting science truthfully, and every incentive to sensationalise it.

avereveard4y ago

cgriswald4y ago

A theory ought to be able to answer questions like “Why don’t train engineers take the curvature of the Earth into account?”

sandgiant4y ago

I agree, but I don't think this is limited to science. I think a great deal of everything is in fact trash. This is why we need education and good faith discussion.

gadders4y ago

https://en.wikipedia.org/wiki/Sturgeon%27s_law

1 more reply

gadders4y ago

Science itself is the best method we have for exploring and making sense of the world around us. The method is rock solid.

In between us (gen pop) and The Method are scientists, and scientists are just as fallible as any other group of people - lawyers, politicians, coders, shop assistants.

specialist4y ago

In other words, emphasize the process more than the outcomes. If the scientific process used proves sound, then I have more confidence in the outcomes.

Alas, that's a hard sell to laypersons thru the mediums of soundbites and tweets.

boppo14y ago

atoav4y ago

It is literally like this:

- someone makes a point that questions your believe

- you google a phrase that would come in studies that proof otherwise

- you take the first thing that looks promising, and fly over the first page, and paraphrase a good bit in a way that makes your point

- you publish it as part of a post, youtube video or whatever

- danger averted

samhw4y ago

That was Andrew Wakefield, FWIW. I totally agree with your point otherwise.

quanto4y ago

A replication crisis indeed exists. All the more reason to analyze rigorously. Poor analyses (and borderline name-calling) in the original article do not help with the crisis.

whatshisface4y ago

>it's possible to use rigorously sound statistics to lie (or at least unknowingly spread falsehoods).

kortilla4y ago

> but I don't think sound statistics can mislead.

Of course they can, unless you magically exclude all statistics that made a bad assumption on independence.

I plot all the daily high temperatures and the presence of the ice cream cart and it turns out the ice cream cart causes warmer highs! Solid statistics.

Turns out the guy that has the ice cream cart has a weather app on his phone though and doesn’t come out on forecasted cold days.

2 more replies

lelanthran4y ago

> It's possible to use rigorously sound statistics to lie (or at least unknowingly spread falsehoods)

The book "How to lie with statistics" is one of the best statistics textbooks that I have read. It basically makes you immune to misleading stats (charts, tables, everything).

IIRC, the only thing that is missing from the book (it's a really old book) which is very relevant is p-hacking.

qfwfq_4y ago

abirch4y ago

> That said, I give the benefit of doubt to the author of "The DK Effect is Autocorrelation." It is a human error to be overly zealous in some opinions without thinking it through.

coffeeblack4y ago

That happens when science is politicized, and any scientists critical of the “official” results is destroyed. From climate to Covid, so many areas where that happens.

pavlov4y ago

If by “destroyed” you mean “billionaires will happily fund their contrarian research for years regardless of its peer reviews”…

coffeeblack4y ago

No, I mean researchers and professors suddenly not getting any research grants anymore, suddenly getting fired from their tenured jobs, not being invited anymore to conferences, etc.

What billionaire funds “contrarian” research?

2 more replies

c1ccccc14y ago

molf4y ago

> calling something a "null hypothesis" is just an arbitrary label that doesn't affect reality

It does affect your conclusions though.

The choice of null hypothesis in this article is: "everyone roughly knows how competent they are". This random data, too, is specifically crafted for the null hypothesis.

There are valid criticisms of the DK study, though. See this comment for example: https://news.ycombinator.com/item?id=31119196

qfwfq_4y ago

https://statmodeling.stat.columbia.edu/2017/01/07/we-fiddle-...

heavyset_go4y ago

I think what contributes to this phenomenon are both second-option bias[1] and motivated reasoning, at least with respect to those who choose to believe in the poor analyses.

[1] https://rationalwiki.org/wiki/Essay:Second-option_bias

Rastonbury4y ago

legalcorrection4y ago

macrolocal4y ago

Yep, it's the worst approach to this subject matter, except for all the alternatives.

gambler4y ago

> The anti-intellectual, anti-scientific streak in many poor analyses claiming to debunk some scientific research is deeply concerning in our society.

People endlessly reference the Dunning-Kruger effect as a meme, without ever having read the paper, let alone having checked its methods. You don't seem to have a problem with that.

This doesn't make any sense except as an extreme case of virtue-signaling.

notahacker4y ago

danbruc4y ago

[1] https://www.researchgate.net/publication/12688660_Unskilled_...

mike_hearn4y ago

I think if you wanted to poke holes in the paper you'd start with the generic issues that are typical to much psychological research:

1. It uses a tiny sample size.

2. It assumes American psych undergrads are representative of the entire human race.

3. It uses stupid and incredibly subjective tests, then combines that with cherry picking:

In other words, if you like the same humor as professors and their hand-picked "joke experts" then you will be assessed as "competent". If you don't, then you will be assessed as "incompetent".

Pyramus4y ago

> is a great example of why people don't or shouldn't take the social sciences seriously.

jtc3314y ago

One does not need to have done social science research to be able to recognize obvious general philosophy of science level problems with the methods used in much social science.

1 more reply

danbruc4y ago

mike_hearn4y ago

Let's look at the second test. It's advertised as a "logic test". The description is:

> Participants then completed a 20-item logical reasoning test that we created using questions taken from a Law School Admissions Test (LSAT) test preparation guide (Orton, 1993).

That's the entire description of their method. So immediately, we can see the following problems:

Putting my two posts together there's a third problem:

3 more replies

Mentlo4y ago

MichaelBurge4y ago

cortesoft4y ago

etchalon4y ago

What's that line about half of all people being below average…

bmacho4y ago

And therefore they will "overestimate their performance".

Is this an existing and relevant psychological phenomenon, different from the general inability to guess unknown things? I don't think so.

If you think so, then give me proof.

PeterisP4y ago

> how would they know that how their classmates perform on an IQ test???

It's not appropriate to treat these aspects as unknown things or unknowable things.

fshbbdssbbgdd4y ago

Thank you! This article creates a dichotomy where our hypothesis must be either

1) self-assessment is perfectly correlated with skill, or

2) completely uncorrelated.

I think neither of these makes sense as a null hypothesis.

The model you describe matches my intuition about what we should expect: people know something about their own skill level, but not everything.

Mentlo4y ago

One minor correction - the article creates a dichotomy where the hypothesis must be either 1) self-assesment is somewhat correlated with skill, or 2) completely uncorrelated

And this is a true dichotomy. The "autocorrelative" effect doesn't need perfect correlation, just some correlation.

0x20cowboy4y ago

60% of the time, it works every time.

1 more reply

galaxyLogic4y ago

dahart4y ago

> I think Dunning Krueger makes intuitive sense.

audiometry4y ago

"Most people who don’t know piano or figure skating are well aware that they don’t know, and do not rate themselves highly at all. "

At least it's my personal experience as a thoroughly unskilled!

sn414y ago

> It’s been argued before that this is the only reason that DK gained any notoriety

I'm sorry, but I have to comment on a word in this line: I feel that increasingly, "notoriety" is used when "notability" might be better.

A thief is notorious. A statistical effect is notable, imho.

But I agree with your basic point: I can't skate, much less figure skate. And I am pretty accurately aware of that fact despite my lack of skill.

dahart4y ago

hef198984y ago

>> figure skating, or law

dahart4y ago

sally_glance4y ago

3 more replies

sam0x174y ago

Put another way, to get good at something, you have to get good at self-assessing your performance at that thing, or you have no way of advancing.

Godel_unicode4y ago

I think it's actually the next step; to get good at something you have to get good practice, and that requires good self-assessments and knowing how to practice the part you're weak on.

There are people who can't advance because they can't see the problem, and people who can't advance because they can't (or don't want to) correct it. The end result is the same though.

juancn4y ago

Conversely it cannot also be under zero, so error is most likely going to be above the actual skill line (over estimation, since it's clamped below it).

djmips4y ago

skybrian4y ago

I think that's assuming more ignorance than even an unskilled person has.

If you had never listened to a professional play piano before then you'd have no idea what level of performance is possible. Similarly, if you had never seen skilled skaters perform on TV.

But we have done these things, so it's obvious that they're doing something that's very difficult.

Maybe you don't fully appreciate the skill, though. You wouldn't do well as a judge who compares the performances of professionals. But comparing novices to professionals seems easy?

digitalronin4y ago

Sometimes the things we find most impressive, in a demonstration of a skill we don't have, aren't the most difficult things.

dav_Oz4y ago

It's more like intermediate (most vulnerable to the DK-effect) to advanced (utmost appreciation for professionals).

galaxyLogic4y ago

Further one travels the less one knows - Lao Zu

bryanrasmussen4y ago

>Similarly, if you had never seen skilled skaters perform on TV.

then, as a person who has lived in the world and has the normal physical skills of such you probably think "whoa, how in the heck did they do that" when you finally see it.

h0l0cube4y ago

The OP article mentions in their rumination that there's some difficulty in generalising DK:

waynesonfire4y ago

Your post reminded me of one of my favorite Adam Savage videos where he touches upon this idea you're exploring. I encourage folks to see it, he articulates it so well.

https://www.youtube.com/watch?v=qE7dYhpI_bI&t=122s

dataflow4y ago

Sounds kind of similar to [1]:

> After two months in the bakery, you learned how to “see” clean.

> Code is the same way.

[1] https://www.joelonsoftware.com/2005/05/11/making-wrong-code-...

nighthawk4544y ago

Thanks for linking that, I really liked this part:

> OK, so far I’ve mentioned three levels of achievement as a programmer:

> 1. You don’t know clean from unclean.

> 2. You have a superficial idea of cleanliness, mostly at the level of conformance to coding conventions.

> 3. You start to smell subtle hints of uncleanliness beneath the surface and they bug you enough to reach out and fix the code.

> There’s an even higher level, though, which is what I really want to talk about:

> 4. You deliberately architect your code in such a way that your nose for uncleanliness makes your code more likely to be correct.

> This is the real art: making robust code by literally inventing conventions that make errors stand out on the screen.

1 more reply

javajosh4y ago

>I think Dunning Krueger makes intuitive sense.

If human cultures can be characterized as default arrogant or default humble then it stands to reason that arrogant cultures will have a DK effect, and in humble cultures you won't.

robocat4y ago

I think you have the common misconception about the DK effect, which is incorrectly summarised as "unskilled and unaware".

JetAlone4y ago

robocat4y ago

> I un-learned things like "all or nothing thinking" and the expectation that I could accurately predict the outcome of any course of action

I am guessing those issues plus there related issues (¿syndromic?) are common - but I don’t know where to group it in my own mind.

1 more reply

darawk4y ago

I can't believe nobody has pointed out that the original article debunking the DK effect is in fact an example of the effect. Truly poetic.

TrackerFF4y ago

Tbh, I only ever hear people reference DK when they're trying to point out that they're "working with morons" (in their opinion, of course).

Scea914y ago

...and claim they suffer from impostor syndrome themself.

spiderfarmer4y ago

It was pointed out multiple times in the original thread.

etchalon4y ago

Something I generally keep in mind about articles posted to HN:

A large portion of the HN audience really, really wants to think they're smarter than mostly everyone else, including most experts. Very few are. I'm certainly not.

Articles which "debunk" some commonly held belief, especially those wrapped in what appears to be an understandable, logical, followable argument, are going to be cat nip here.

Articles like this are even stronger cat nip. If a member of the HN audience wants to believe they're mostly smarter than mostly everyone else, that includes other members of the HN audience.

If the article is right, it will be debated and I'll see more articles about it, and it'll generate sufficient echoes in the right caves of the right experts. Once it does, I can change my view then.

And incredible things require incredible evidence. And a blog post rarely, if ever, meets that standard.

goosedragons4y ago

karpierz4y ago

This confused me at first too. The issue is that "X" is your performance, and "Y" is your perceived performance.

The only case where X and Y wouldn't correlate at all is if people have no ability to assess their performance (IE, Y isn't sampled around X, but is instead sampled from a fixed range).

Tyr424y ago

That's exactly the difference this article is driving at!

blamestross4y ago

Possibility 3 backed up by all the same data:

The less you know, the more random your guess at your own knowledge is. The actual value is low and less than zero isn't an option, so this drags the average up consistently.

twobitshifter4y ago

majormajor4y ago

blamestross4y ago

There are ways to fix this:

- Throw out the extreme high and low ends of the data bc the model breaks down there. (Which results in a very boring result)

- Have people guess their score and a rough level of confidence along side it (just a 0-5 sort of thing) and see what happens.

_0ffh4y ago

I also recently though about this problem and came to the same hypothesis, which fits the Dunning-Kruger data perfectly.

zharknado4y ago

Thanks for writing! Really valuable rebuttal imo.

Maybe someone with more actual skill can elaborate or correct haha.

parentheses4y ago

jasonhansel4y ago

I think that most people who talk a lot about DK believe that they are the experts in one field or another.

AmericanChopper4y ago

I feel like I’m honestly yet to see somebody make DK accusations in a way that’s not totally cringe.

rootusrootus4y ago

> I think that most people who talk a lot about DK believe that they are the experts in one field or another.

LandStander4y ago

I feel like there's a bit of a paradox here. The more I internalize how easy it is for us to be overconfident in our intelligence, the more confident I feel in my intelligence...

IshKebab4y ago

Ha look at this guy saying things. What an idiot. DK effect amiright?

sjmm19894y ago

semanticjudo4y ago

omnicognate4y ago

Gah, I wish I had time to fully read this and get into it, but I have to spend the next few hours driving.

Unfortunately the original article isn't very clearly explained, and it's only on reading the discussion in the comments under it that it becomes clear what it's actually saying.

andersource4y ago

Hey omnicognate, good to see you here, appreciated our previous discussion.

In fact, DK do concern themselves with the test reliability, at least to some extent. That doesn't appear in the graph under scrutiny but appears in the study.

> The key point though is that if you use a separate sample of X that correlation disappears completely

topaz04y ago

andersource4y ago

I agree in principle, although I think to get an effect size similar to what DK observed you'd need quite large noise. Which again comes back to the test reliability.

seniortaco4y ago

Beyond the validity of the statistical methods used.. can someone clarify what is the actual hypothesis we are debating about competence? And what does each article propose?

My understanding is that the hypothesis is "Those who are incompetent overestimate themselves, and experts underestimate themselves".

DK says: True

DK is Autocorrelation says: ???

"I cant let go..." says: True?

HN says: also True?

So should my takeaway here just be that the DK hypothesis is True and that this is all arguing over details?

vetleen4y ago

DK says: True

DK is Autocorrelation says: The DK article is based on a false premise, we got to disregard it

formerly_proven4y ago

> My understanding is that the hypothesis is "Those who are incompetent overestimate themselves, and experts underestimate themselves".

Arguably the hypothesis that matches the data from the DK paper best is: "Everyone thinks they're average regardless of skill level"

dragonwriter4y ago

> The DK hypothesis is "double burden of the incompetent"

> Arguably the hypothesis that matches the data from the DK paper best is: "Everyone thinks they're average regardless of skill level"

If there was a perceptual nudge toward average relative performance, you'd expect a crossover at the median with a slope below 1, the nudge is toward a particular point above average.

formerly_proven4y ago

> The actual DK result

The "incompetent self-assessment because incompetent" claim is literally everywhere in the paper. It's in the title, the abstract, the introduction and every section thereafter until the end.

Tenoke4y ago

>Arguably the hypothesis that matches the data from the DK paper best is: "Everyone thinks they're average regardless of skill level"

0. https://andersource.dev/assets/dk-autocorrelation/dk.webp

Terretta4y ago

Read through comments to see if this point was made, thank you.

This is so well understood there’s even a joke about it re: drivers that everyone “gets” even while knowing it doesn’t apply to them.

But to your point, look at the slope in the second chart here, “Histogram of subjective ranks”:

https://gottwurfelt.com/2012/02/28/why-everyone-thinks-theyr...

Compare to DK slope…

hef198984y ago

Attacking the Dunning-Kruger study only on statistical grounds looks like aprime example of the DK effect in itself...

john_pryan4y ago

For anyone who is interested in playing around with these charts, the various assumptions that under pin them etc. I've thrown together a colab notebook as a starting point.

If you rank via observed performance and plot observed vs predicted the effect is there. CC of -0.38085757.

This is assuming very simple gaussian noise which is not going to be accurate especially as most of these tasks have normalised scores.

Edit: fixed wrong way around

https://colab.research.google.com/drive/1Vy7JjkywxwEP8nfR6oS...

andersource4y ago

Thanks John! Very interesting.

My point being, that any purely random analysis is based on assumptions that can easily be tweaked to show the same effect, the opposite effect, or no effect at all.

john_pryan4y ago

That's a nice spot about the decreasing CC as we increase accuracy!

andersource4y ago

1 more reply

kybernetikos4y ago

sandgiant4y ago

And this is why we need error bars on all plots. Looking at these plots there is no way to know whether people guessed uniformly or whether the self assessment is clustered around the mean.

IshKebab4y ago

Yeah I agree that's the likely explanation. Nobody wants to admit that they're terrible and nobody wants to boast that they're the best and be proven wrong.

So my suspicion is that the DK effect is not really a symptom of people's inability to accurately self-assess, but they're unwillingness to accurately report that self-assessment.

And I don't think it is unique to self assessment either. It's common knowledge that ratings on a scale out of 10 for pretty much everything are nearly always between 6 and 9.

I don't know how they did the experiment but I bet they'd get different results if the self-assessments were anonymous and accuracy came with a big financial reward.

Anyway that's all irrelevant to the point of the article which I think is correct.

nosefrog4y ago

I think this article gives a better summary of how the Dunning-Kruger effect probably isn't a psychological effect: https://www.mcgill.ca/oss/article/critical-thinking/dunning-...

civilized4y ago

> if an effect can be replicated with random noise, then it's not a psychological effect

This isn't true either. Statistical dependence does not determine or uniquely identify causal interpretation or system structure. See Judea Pearl's works (e.g. The Book of Why) for more on this.

Now, if you asked frogs for a self-assessment of skill, I would expect that data would not show any psychological effects.

bmacho4y ago

Is there an existing and relevant phenomenon about people lacking the ability to self-asses, that is true, proven, and not just trivia?

I do believe that people understand all the available information about their skills and performance, and they rate themselves according to it.

Is there anything against this view?

civilized4y ago

I have no idea. I think it would be a fascinating experiment to take people who have never taken an IQ test, ask them how they think they'll do, then compare that against their actual performance.

nosefrog4y ago

Sure, it's an interesting question, but if the way you're measuring it can't be distinguished from noise, you need to find a different measure.

civilized4y ago

It makes sense as a robotically developed null hypothesis, but it doesn't make sense in the real world.

anonymoushn4y ago

randcraw4y ago

As a novice on DK, it seems to me that, for DK to be 'suprising' (in the parlance of the OP), four phenomena must hold:

1) an incompetent person is poorer than average at self assessment of their skill

bryanrasmussen4y ago

>it seems to me that, for DK to be 'surprising' (in the parlance of the OP), four phenomena must hold:

taeric4y ago

There is an easy reframing that works well. People that are "poor" at something don't know enough to know just how good someone can be at something.

What about writing your name? Putting on your clothes? Making your bed? All things that are somewhat bound in just how good you can be.

Now, throw in something like "play the piano." Turns out, the expertise bar is much much higher suddenly. But, it you haven't been trying, how would you know?

CRConrad4y ago

Obligatory Shoelace Site page: https://www.fieggen.com/shoelace/knots.htm

bryanrasmussen4y ago

1 more reply

seoaeu4y ago

> And this tracks for most skills. How good are you at tying your shoes? Probably average?

watwut4y ago

That is because DK does not day "incompetent people see themselves as pros". It says "they overestimate their abilities". They rate themselves low, but in fact their competence is even lower.

watwut4y ago

What does "suffering from DK" even means?

CRConrad4y ago

Exactly. One would enjoy this blissful ignorance of one's incompetence, I'd think.

haberman4y ago

On a pure human level, a large portion of DK discourse seems to be a fight over which people are the "Unskilled and Unaware." Or more bluntly, who gets to call who stupid.

The author says as much in this article:

It's true, the previous article (https://economicsfromthetopdown.com/2022/04/08/the-dunning-k...) was pretty harsh on the authors of the original paper:

jasonhansel4y ago

> it functions as a highbrow and socially acceptable way of calling someone else stupid

I think it's even worse that that: it's also a socially acceptable way of enforcing credentialism and looking down on others for not having a sufficiently elite education.

insaider4y ago

So true, the whole DK discourse is very rarely constructive. Except on HN of course ;)

lamontcg4y ago

I'm not a practicing statistician, so I'm uncertain how to weigh the two arguments here.

kurthr4y ago

I see what you did and I am completely uninformed in my certainty that Dunning-Kruger is wrong!

bombcar4y ago

I fall back to Sturgeon’s Law and assume 90% of everything, including me, is shit.

lamontcg4y ago

Even though I lack a medical degree I have a high degree of confidence that the intestines of most people are not sufficiently large enough to support that amount of faeces.

1 more reply

rzzzt4y ago

This sounds like an extreme illustration of the Baader-Meinhof phenomenon.

torginus4y ago

Conversely, if my competitors were skilled competitive coders, I'd put myself near the bottom, again with high confidence.

No need to tell tall tales of humble prodigies and boastful incompetents.

jldugger4y ago

> Again, my main point is that there’s nothing inherently flawed with the analysis and plots presented in the original paper.

I find the use of quartiles suspicious, personally. It's very nearly the ecological fallacy[1].

> I’m not going to start reviewing and comparing signal-to-noise ratios in Dunning-Kruger replications

[1]: https://en.wikipedia.org/wiki/Ecological_correlation [2]: https://replicationindex.com/2020/09/13/the-dunning-kruger-e...

geysersam4y ago

Agree. From the DK article graph it is not possible to separate the cases

1. Average self assessment coincides with true skill, but variance increases with low skill.

2. Average self assessment is biased, and the bias is positive when you are unskilled and negative when you're highly skilled.

These two situations would create indistinguishable DK-graphs. I don't understand how anyone can be sure on either (1) or (2) after seeing one instance of such a graph.

As I see it, the only way out for "DK positivists" is to say that the DK hypothesis is unrelated to the truth values of (1) and (2). Or, that there is other evidence making DK convincing.

Neither seems very plausible!

closed4y ago

FWIW extreme groups (e.g. using upper and lower quartiles) is well understood in its inflation of effect size (there are even formulas to correct this, given an extreme groups design).

It's definitely related to ecological fallacy in the sense that both underestimate relative error and inflate effect sizes.

_carbyau_4y ago

Do they have to encourage replication?

1 more reply

ComradePhil4y ago

irrational4y ago

> If you tell me you didn’t have a single serious thought of self-assessing today, even semi-conscious, I simply won’t believe you.

I stopped reading at this point. Someone that is so certain that they say “I simply won’t believe you.” is too self-assured to be worth paying much attention to.

Scea914y ago

If you tell me that Earth is flat I simply won't believe you.

irrational4y ago

That is factual data. Saying you won’t believe that a person is or is not thinking about something is entirely different.

Scea914y ago

> I stopped reading at this point. Someone that is so certain that they say “I simply won’t believe you.” is too self-assured to be worth paying much attention to.

Actually it is even more ironic. You are too self-assured that a multi-page article is not worth paying attention to because of a single sentence in it that irritates you.

LudwigNagasena4y ago

The author seems to go completely astray at some point.

It seems like the author fundamentally misunderstands the basics of statistics.

dahart4y ago

denton-scratch4y ago

Put differently, if everyone's estimate was exactly the mean, you'd still see a "DK effect".

dahart4y ago

jcranberry4y ago

oh_my_goodness4y ago

If people wander off through the verbiage of any article, where the chatter isn't supported by data, sure, they'll tend to get speculation.

jcranberry4y ago

I don't really understand what you're saying. Are you the saying the charts don't actually make it clear, or that they make it clear that self assessment is independent but not necessarily uniform?

oh_my_goodness4y ago

The charts make it clear that self-assessment was roughly unrelated to ability. That's not an artifact of autocorrelation, instead it appears to be an experimental result.

rafaeltorres4y ago

Yeah, this makes sense to me.

topaz04y ago

clwk4y ago

"Most believe themselves 'above average' at most things."

mewse4y ago

Most people have an above-average number of legs.

There's really no contradiction there; all it takes is for there to be a couple low scores pulling the average down.

dragonwriter4y ago

> Most people have an above-average number of legs.

The arithmetic mean and the median are both averages, but the upthread comment was about the median and yours about the arithmetic mean.

> There's really no contradiction there; all it takes is for there to be a couple low scores pulling the average down.

Well, no, when what you are estimating is relative performance by score percentiles, and people's self evaluation is biased toward the 70th percentile, that's not what is happening.

LaurieKoudstaal4y ago

In the context of the paper, we should be talking about the median, not the mean.

aaaronic4y ago

It seems like the people who want to disprove Dunning-Kruger are falling victim to it.

jasonhansel4y ago

> Is there an answer to that humility gained by being an expert in one field being translated to better self-assessment in other fields?

lamontcg4y ago

https://xkcd.com/793/

"there's nothing more annoying than a physicist encountering a new subject"

robocat4y ago

Or perhaps your comment is the relatively rare meta-meta-DK effect.

ncmncm4y ago

"Most citations of D-K are examples of it."

8note4y ago

The open question this raises to me is why a DK=true set of data would show up with the same graph as a uniformly random set

obastani4y ago

Uniformly random data means that someone’s perception of their ability is uncorrelated with their actual ability, which is exactly what DK=true is saying!

tpoacher4y ago

Great article. Very nicely written.

Which is an all too common thing to begin with. (that particular article pulled the same stunt with the definition of the word 'autocorrelation', after all).

sumanthvepa4y ago

jasonhong4y ago

stagas4y ago

weird-eye-issue4y ago

Sounds like DK in effect

djs0704y ago

Parent comment made no evaluation of their own ability, so I don’t see how it’s DK in effect at all.

tunesmith4y ago

(That's not to criticize OP - when someone makes an argument that sounds convincing, it can be pretty convincing! It's just different than actually being valid.)

weird-eye-issue4y ago

They immediately reached a conclusion about something upon reading about it and now as they learned more, they understand that there might be more nuance to it

nokya4y ago

Thank you infinitely for taking the time to respond.

Thanks.

emsign4y ago

Sounds like the premise is flawed. He's assuming kids are good at getting another 10 minutes before bedtime. All of them? What about those who fail? Those that don't even try?

The issue is not the way our brains generalize, but that you are using just one brain, one life's experience.

t_mann4y ago

It can give us an indication of how the growth rate depends on size

wodenokoto4y ago

This is a follow up/reaction to an article that hit the front page a few days ago. Might be worth to check out the discussion there as well:

https://news.ycombinator.com/item?id=31036800

brodouevencode4y ago

Question for you folks that are smarter than me (see what I did there?) - DK has surfaced a lot here and in the online world more broadly with seemingly increased frequency. Why do you think that is?

gverrilla4y ago

Science is in deep crisis. It's only utility today is supporting industry and some public infrastructure. Social sciences are a scam, being economics the greatest racket amongst them all.

andi9994y ago

The relative DK effect not to exist would require clairevoyance from the participants. The non relative dk effect is more interesting.

marcholagao4y ago

tldr; D+K's experiment was: Assign the numbers 1 thru 10 to ten people. Have each role a 10 sided die. The person assigned a 1 will roll higher than his assigned number 90% of the time.

Daniel: >It’s not a “statistical artifact” - that will be your everyday experience living in such a world.

jameshart4y ago

This is such a bizarre argument.

Dunning Krueger is precisely about the surprising result that people are bad at estimating their performance!

That seems like an even more radical position than the D-K thesis.

slavik814y ago

> Dunning Krueger is precisely about the surprising result that people are bad at estimating their performance

notahacker4y ago

anonymoushn4y ago

The claim that skill does not exist or that people are totally unable to recognize how good they are at anything is quite radical.

It's sort of surprising to me to read your comment because TFA is an extended rebuttal of your comment.

NumberCruncher4y ago

> We don’t need statistics to learn about the world.

Anyhow, after spending years on a team filled with social science PHDs, I would not waste my time on reading papers about statistical analysis done by social scientist.

Traster4y ago

I don't think your interpreting the sentence correctly, I think it's more correct to read it as:

> We don't [only] need [the formal discipline of mathematics known as] statistics to learn about the world.

Sure, there are things you can only functionally ascertain through statistical analysis. But not everything in the world needs rigorous statistics.

snovv_crash4y ago

We don't need != We don't only need.

First one means "throw it away". Second one means "add other things too".

NumberCruncher4y ago

> I don't think your interpreting the sentence correctly

And I think you are injecting the words "only", "there are" and "everything" here and there just to change the meaning of the sentences I quoted and I have written...

TimPC4y ago

longtimegoogler4y ago

Well argued and I agree completely.

obastani4y ago

So, my conclusion would be that DK holds, but it does depend on exactly what is the exact claim in the original DK paper.

[1] https://economicsfromthetopdown.com/2022/04/08/the-dunning-k...

[2] https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect

[3] https://news.ycombinator.com/item?id=31036800

saurik4y ago

> I haven't looked at the original paper, but according to Wikipedia [2], the only claim being made appears to be the "uncorrelated" claim.

hgomersall4y ago

devit4y ago

The "The Dunning-Kruger Effect is Autocorrelation" article is an example of obvious bullshit.

Also, the knowledge that the authors published that article provides evidence for the Dunning-Kruger effect being stronger than one would otherwise believe.

ta1234578644y ago

Your comment amounts to saying that some of the randomly generated data really is consistently over estimating it's performance. How absurd.

soVeryTired4y ago

That original article was bogus and needlessly combative. I feel like the majority view in the HN comments saw it as such.

Most comments were splitting hairs on what _exactly_ the Dunning-Kruger effect was, plus some general nerd-sniping on how the original article was off base.

IMO it was something that fell flat on its own rather than something that needed a lengthy refutation, but I can understand that sometimes these things get under your skin.

prvc4y ago

ipnon4y ago

It would be quite ironic if Dunning-Kruger opponents were arguing against its statistical validity with faulty statistical reasoning.

murrayb4y ago

Edit-see below I meant opposite not corollary.

smw4y ago

Isn't that the exact opposite of D-K?

murrayb4y ago

Doh corollary doesn't mean what I thought. I learned something thank you. I meant the opposite.

taeric4y ago

Perhaps you meant contrary?

1 more reply

keshetOP4y ago

Somebody is still wrong on the internet

1 more reply

photochemsyn4y ago

Now with D-K the proposed problem is statistical autocorrelation, not systematic bias, due to lack of independence, as here:

It does vaguely remind me of the whole Bell Curve controversy of years past... in that case, systematic bias was more of an issue:

https://www.nytimes.com/1994/10/26/opinion/in-america-throwi...

I am reminded of something my very accomplished PI (in the field of earth system science) confided privately to me once... "Purely statistical arguments," she said, "are mostly bullshit..."

anonymoushn4y ago

> Regardless, it's fairly obvious that D-K enthusiasts are of the opinion that a small group of expert technocrats should be trusted with all the important decisions

It seems like you're roughly the only person who thinks this.

j / k navigate · click thread line to collapse