Your blue book is being graded by a stressed out and very underpaid grad student with many better things to do. They're looking for keywords to count up, that's it. The PI gave them the list of keywords, the rubric. Any flourishes, turns of phrase, novel takes, those don't matter to your grader at 11 pm after the 20th blue book that night.
Yeah sure, that's not your school, but that is the reality of ~50% of US undergrads.
But again, the test creator matters a lot here too. To make such an exam is quite the labor. Especially as many/most PIs have other better things to do. Their incentives are grant money, then papers, then in a distant 3rd their grad students, and finally undergrad teaching.any departments are explicit on this. To spend the limited time on a good undergrad multiple choice exam is not in the PIs best interest.
Which is why, in this case of a good Scantron exam, they're likely to just farm it out to Claude. Cheap, easy, fast, good enough. A winner in all dimensions.
Also, as an aside to the above, an AI with OCR for your blue book would likely be the best realistic grader too. Needs less coffee after all
Now that I haven't been a student in a long time and (maybe crucially?) that I am friends with professors and in a relationship with one, I get it. I don't think it would be appropriate for a higher level course, but for a weed-out class where there's one Prof and maybe 2 TAs for every 80-100 students it makes sense.
As someone who has been part of the production of quite a few high stakes MC tests, I agree with this.
That said, a professor would need to work with a professional test developer to make a MC that is consistently good, valid, and reliable.
Some universities have test dev folks as support, but many/most/all of them are not particularly good at developing high quality MC tests imho.
So, for anyone in a spot to do this, start test dev very early, ideally create an item bank that is constantly growing and being refined, and ideally have some problem types that can be varied from year-to-year with heuristics for keys and distractors that will allow for items to be iterated on over the years while still maintaining their validity. Also, consider removing outliers from the scoring pool, but also make sure to tell students to focus on answering all questions rather than spinning their wheels on one so that naturally persistent examinees are less likely to be punished by poor item writing.
You can solve that but it's a combinatorial explosion.