Let's look at the second test. It's advertised as a "logic test". The description is:
> Participants then completed a 20-item logical reasoning test that we created using questions taken from a Law School Admissions Test (LSAT) test preparation guide (Orton, 1993).
That's the entire description of their method. So immediately, we can see the following problems:
1. Just like the joke test, there's no way to replicate this given the description in the paper. Which questions did they take and why? In turn this throws all claims that the DK study has been replicated into question.
2. The citation is literally a Cliffs Notes exercise for students. It's about memorization of answers to pass law exams, not an actual test itself designed to verify logical reasoning ability. Why do they think this is a good source of questions for testing logic? Law is not a system of logic, there's even a famous saying about that: "the life of the law is not logic but experience". If you wanted to test logical reasoning a more standard approach would be something like Raven's Matrices.
Putting my two posts together there's a third problem:
3. Putting aside the obvious problems with subjectivity, their joke test is defined in an illogical way. They define a test of expertise (working as a comedian), select some people who pass this test and define them as experts, then discover that one expert would have been ranked by their own test as "incompetent but doesn't know it". Yet this is a contradiction, because this person was selected specifically because the researchers defined them as competent. Rather than deal with this logical contradiction by reframing the question they simply ignore it by discarding that comedian from their expert pool.
This is good evidence that DK themselves weren't particularly logical people, yet, they claim to have designed a test of logic - a bold claim at the best of times. Ironically, it appears DK may be suffering from their own effect. They believe themselves to be competent at designing tests yet the evidence in their paper suggests they aren't.