undefined | Better HN

0 pointsmike_hearn4y ago0 comments

Is there supplemental material I didn't notice? I only scan read it after the joke section but I can't find any mention of supplemental data anywhere. That's a problem because although you say the other tests are better, no information appears to be provided on which we can judge that.

Let's look at the second test. It's advertised as a "logic test". The description is:

> Participants then completed a 20-item logical reasoning test that we created using questions taken from a Law School Admissions Test (LSAT) test preparation guide (Orton, 1993).

That's the entire description of their method. So immediately, we can see the following problems:

1. Just like the joke test, there's no way to replicate this given the description in the paper. Which questions did they take and why? In turn this throws all claims that the DK study has been replicated into question.

2. The citation is literally a Cliffs Notes exercise for students. It's about memorization of answers to pass law exams, not an actual test itself designed to verify logical reasoning ability. Why do they think this is a good source of questions for testing logic? Law is not a system of logic, there's even a famous saying about that: "the life of the law is not logic but experience". If you wanted to test logical reasoning a more standard approach would be something like Raven's Matrices.

Putting my two posts together there's a third problem:

3. Putting aside the obvious problems with subjectivity, their joke test is defined in an illogical way. They define a test of expertise (working as a comedian), select some people who pass this test and define them as experts, then discover that one expert would have been ranked by their own test as "incompetent but doesn't know it". Yet this is a contradiction, because this person was selected specifically because the researchers defined them as competent. Rather than deal with this logical contradiction by reframing the question they simply ignore it by discarding that comedian from their expert pool.

This is good evidence that DK themselves weren't particularly logical people, yet, they claim to have designed a test of logic - a bold claim at the best of times. Ironically, it appears DK may be suffering from their own effect. They believe themselves to be competent at designing tests yet the evidence in their paper suggests they aren't.

0 comments

ohwellhere4y ago

In my experience prepping and taking the test, I found the LSAT logic questions to be pretty good at assessing deductive reasoning.

They’re 100% divorced from law and are closer to puzzles of the nature of, “Six people sit at a table, four of whom are wearing hats, three of which are red, …”

danbruc4y ago

I could not quickly find the LSAT preparation guide but I found some LSAT sample questions [1] and they seem suitable to assess reasoning abilities. Also I do not think that it really matters which questions you choose as long as they span a wide enough difficulty range so that you are able to separate participants.

[1] https://www.petersons.com/blog/sample-lsat-test-questions/

mike_hearnOP4y ago

Hmm, do they? The logical reasoning test in that page is a question about lab rat studies on coffee+birth defects, and a hypothetical spokesperson's response that they wouldn't apply a warning label because the government would lose credibility if the study were to be refuted in future. You're then asked a multiple choice question:

1. Which of the following is most strongly suggested by the government’s statement above?

(A) A warning that applies to a small population is inappropriate.

(B) Very few people drink as many as six cups of coffee a day.

(D) Studies on rats provide little data about human birth defects.

(E) The seriousness of birth defects involving caffeine is not clear.

Given the structure of this question I assumed there'd be more than one right answer but apparently, the only "logical" answer is C.

Maybe the word logic is used differently in the legal profession, but this doesn't resemble the kind of logic test I'm used to. It's about unstated/assumed implications of natural language statements i.e. what a 'reasonable' person might read into something, rather than some sort of tight reasoning on which logical laws could be applied. I can see why that's relevant for lawyers but it's not really about logic.

Still, let's roll with it. (A) and (B) are clearly irrelevant given the stated justification, strike those. But (C) and (D) appear to just be minor re-phrasings of each other. Why is C correct but D not? An implied assumption of the study is that rat studies provide a lot of data about human birth defects, and the government's position implies that they don't agree with that. D could easily be a reasonable subtext for that position. E could also be taken as a reasonable inference, that is, the government believes there's a risk the study authors are using an exaggerated definition of birth defect that voters wouldn't agree with, and that 'refutation' of the study would take the form of pointing out the definitional mismatch.

So if I was asked to score this question I'd accept C, D or E. The LSAT authors apparently wouldn't.

That said, the "analytical reasoning" sample question looks more like a logic test, and the logic test looks more like a test of analytical reasoning. But even their bus question is kind of bizarre. It's not really a logical reasoning test. It's more like a test to see if you can ignore irrelevant information. The moment they say rider C always takes bus 3, and then ask which bus {any combination + C} can take, the answer must be (C) 3 only. Which is the correct answer.

> I do not think that it really matters which questions you choose as long as they span a wide enough difficulty range so that you are able to separate participants.

The problems here are pointing at a fundamental difficulty: all claims about competence/expertise are relative to the person picking the definition of competent. In this case the tasks are all variants on "guess what the prof thinks the right answer is", which is certainly the definition of competence used in universities, but people outside academia often have rather different definitions.

So the questions really do matter. If the DK claim was more tightly scoped to their evidence - "people who think they're really good at guessing what DK believe actually aren't" - then nobody would care about their results at all. Because they generalized undergrads guessing what jokes Dunning & Kruger think are funny to every possible field of competence across the entire human race, they became famous.

pedrosorio4y ago

> Given the structure of this question I assumed there'd be more than one right answer

I did not, since the question is explicit about there being one correct answer only: “Which of the following is most strongly suggested by the government’s statement above?”

> but this doesn't resemble the kind of logic test I'm used to. It's about unstated/assumed implications of natural language statements

Agreed.

> But (C) and (D) appear to just be minor re-phrasings of each other.

I think the key here is “there are doubts”. The government’s position stems from doubts on the conclusive nature of the study, that’s it. The statement doesn’t say anything about how much data studies on rats provide about human birth defects. If we’re being logical, studies on rats provide “no data” on human birth defects. Across many studies with different substances there may be a correlation (p(human birth defect | rat birth defect) = x), but an observation of birth defects on rats for a particular substance gives us data about rat birth defects, not human ones.

mike_hearnOP4y ago

Ah yes - is vs are. You're right. I think I assumed there'd have to be >1 right answer after reading the options.

It's a remarkably poor question, but option (C) isn't about doubts on the conclusive nature of this specific study, but rather the nature of all studies on all animals. You could credibly argue (and I'd hope a lawyer would!) that no government would base policy on doubting all animal studies and that their position in this case must therefore be due to something about this specific study, e.g. the usage of rats, or the topic of birth defects, or both. So they could argue that (D) is the most logical answer.

Not that it really matters. Pretty clearly the LSAT authors are using the word logical in the street sense of "makes sense" or "sounds plausible" rather than meaning "based on an inference process that's free of fallacies". If DK based their test of competence on questions like this then it doesn't mean much, in my view.

1 more reply

danbruc4y ago

The second question labeled analytical reasoning is probably closer to what you consider a logical reasoning question, maybe they picked questions more like those?

mike_hearnOP4y ago

That's the issue - we don't actually know what they did. Which means their claims would have to be taken on faith.

Now, maybe other researchers designed different more rigorous studies that are replicable and which show the same effect. That could be the case. The point I'm making here is that the DK paper isn't by itself capable of proving the effect it claims, and that you don't need a statistical argument to show that. Sanity checking the study design is a good enough basis on which to criticize it.

samhw4y ago

> It's about memorization of answers to pass law exams, not an actual test itself designed to verify logical reasoning ability. Why do they think this is a good source of questions for testing logic?

I'm not sure what that has to do with anything? The paper doesn't claim to have anything to do with testing logic. It's about people's self-perception in relation to a task at which they are, or are not, competent. That task could be juggling watermelons or strangling geese for all it matters.

mike_hearnOP4y ago

> The paper doesn't claim to have anything to do with testing logic.

The paper reports on the results of a 'logic' test administered to undergrads and uses this to define competence. It's a key part of their evidence their effect is real.

> It's about people's self-perception in relation to a task at which they are, or are not, competent. That task could be juggling watermelons or strangling geese for all it matters.

The specific tasks matter a great deal.

The whole paper relies very heavily on the following assumption: DK can accurately and precisely tell the difference between competence and lack of competence. In other words, that they know the right answers to the questions they're asking their undergrads.

In theory this isn't a difficult bar to meet. They work at a school and schools do standardized testing on a routine basis. There are lots of difficult tasks for which there are objectively correct and incorrect answers, like a maths test.

But when we read their paper, the first two tasks they chose aren't replicable, meaning we can't verify DK actually knew the right answers. Plus the first task is literally a joke. There isn't even a right answer to the question to begin with, so their definition of "competence" is meaningless. The other tasks might or might not have right answers that DK correctly selected, but we can't verify that for ourselves (OK, I didn't check their grammar test but given the other two are unverifiable why bother).

That's a problem because the DK effect could appear in another situation they didn't consider: what if DK don't actually know the right answers to their questions but their students do. If this occurs then what you'd see is this: some students would answer with the "wrong" (right) answers and rate their own confidence highly, because they know their answer is correct and don't realize the professors disagree. Other students might realize that the professors are expecting a different answer and put down the "right" (wrong) answer, but they'd know they were playing a dangerous game and so rate their confidence as lower. That's all it would take to create the DK effect without the underlying effect actually existing. To exclude this possibility we have to be able to check that DK's answers to their own test questions are correct, but we can't verify that. Nor should we take it on faith given their dubious approach to question design.

samhw4y ago

> The paper reports on the results of a 'logic' test administered to undergrads and uses this to define competence.

Right, but my point is that 'logic' is simply being used as an example of 'a task'. It's immaterial whether it's actually a good test of logic. As long as you agree that whatever it is is a good example of 'a task', then it's equally probative for the purpose of their argument.

mike_hearnOP4y ago

The tasks aren't arbitrary. They're meant to be a proxy for some universal concept of competence. That's why DK is a well known effect, it claims to hold true for anything even though they can't test every possible task.

> we presented participants with tests that assessed their ability in a domain in which knowledge, wisdom, or savvy was crucial: humor (Study 1), logical reasoning (Studies 2-and 4), and English grammar (Study 3).

They picked humor because they think it reflects "competence in a domain that requires sophisticated knowledge and wisdom". They then realized the obvious objection - it's subjective - and decided to do the logical reasoning task to try and rebut those complaints (but then why do the first experiment at all?):

> We conducted Study 2 with three goals in mind. First, we wanted to replicate the results of Study 1 in a different domain, one focusing on intellectual rather than social abilities. We chose logical reasoning, a skill central to the academic careers of the participants we tested and a skill that is called on frequently ... it may have been the tendency to define humor idiosyncratically, and in ways favorable to one's tastes and sensibilities, that produced the miscalibration we observed-not the tendency of the incompetent to miss their own failings. By examining logical reasoning skills, we could circumvent this problem by presenting students with questions for which there is a definitive right answer.

So logical reasoning was chosen because:

1. It's objective.

2. It's an important skill.

3. It's a general "intellectual" skill.

That makes it very important if it's actually a good test of logical reasoning. If it was truly an arbitrary test like an egg-and-spoon-race or something, then there's no reason to believe the results would generalize to other areas of life and nobody would care.

2 more replies

danbruc4y ago

For completeness I would add that a good task must allow objectively rating the performance of participants with [much] room for debate. But given that, the whole setup is self-contained and task-independent. Let participants perform the task and establish their competence by rating their performance. Then let participants perform the meta-tasks of rating their performance in absolute and relative terms and finally check how task and meta-task performances are related.

j / k navigate · click thread line to collapse