This isn't a data science question. Especially if the data science is blind to meter and to phonetics.
But I agree it's silly that say "the" is what makes Macbeth creepy, and not, you know, the occult theme that permeates it.
"Give me thuh cat toy." (Some ordinary toy.)
"Give me thee cat toy." (The one with special powers.)
> [...] Look like th' innocent flower,/But be the serpent under ’t.
It is still acceptable in modern English to say something like
> Seem like the innocent flower, but be as the serpent underneath it.
Certainly not casual, everyday speech -- but using a rhetorical strategy of referring to an archetypal innocent flower, or an archetypal serpent. I think it's an enormous stretch to claim that Lady Macbeth and Macbeth had a specific innocent flower in mind when they were speaking.
"Then he offered it to him again; then he put it by again; but, to my thinking, he was very loath to lay his fingers off it. And then he offered it the third time; he put it the third time by;"
Or Much Ado About Nothing:
"I have the toothache."
It is not surprising that the way (a way?) articles are used has changed in the last 400 years.
Then again, the English say "I'm going to hospital" where an American would say "I'm going to the hospital", so maybe the Bard used up all of the (how do you pluralize the??) so that the English use it less? At least the TFA author might theorize as such.
The fact that "the" happens to be used more often in Macbeth than in other Shakespeare plays seems to me more likely to be noise with no deeper meaning.
I have a headache and a toothache.
I have a cold and a fever. I have yellow fever.
I have a cough. I have whopping cough.
I have the flu.
I have the chicken pox, smallpox, the measles, and the mumps. I have rabies.
for f (*.txt); do echo $f; tr " \r" "\n\n" < $f | grep -A100000 ACT | egrep -v "^[A-Z]+$" | grep -i [A-Z] | egrep -i "^th['e]$" | wc -l; tr " \r" "\n\n" < $f | grep -A100000 ACT | egrep -v "^[A-Z]+$" | grep -i [A-Z] | wc -l; echo; done
Here were the results that gave me, formatted into a table and sorted by descending frequency: | Rank | Play | The or Th’ | Words | Per 10000 |
|------+-----------------------------+------------+-------+-----------|
| 1 | macbeth | 724 | 16929 | 427.7 |
| 2 | henry-v | 1065 | 25577 | 416.4 |
| 3 | coriolanus | 1126 | 27294 | 412.5 |
| 4 | loves-labors-lost | 855 | 21093 | 405.3 |
| 5 | henry-viii | 962 | 24074 | 399.6 |
| 6 | the-merchant-of-venice | 834 | 20985 | 397.4 |
| 7 | henry-iv-part-2 | 1001 | 25762 | 388.6 |
| 8 | henry-vi-part-2 | 990 | 25597 | 386.8 |
| 9 | hamlet | 1142 | 30006 | 380.6 |
| 10 | henry-iv-part-1 | 856 | 24100 | 355.2 |
| 11 | henry-vi-part-3 | 866 | 24491 | 353.6 |
| 12 | antony-and-cleopatra | 861 | 24465 | 351.9 |
| 13 | king-lear | 898 | 25661 | 349.9 |
| 14 | the-winters-tale | 854 | 24568 | 347.6 |
| 15 | king-john | 717 | 20730 | 345.9 |
| 16 | cymbeline | 959 | 27738 | 345.7 |
| 17 | a-midsummer-nights-dream | 564 | 16377 | 344.4 |
| 18 | richard-iii | 985 | 28914 | 340.7 |
| 19 | richard-ii | 753 | 22224 | 338.8 |
| 20 | henry-vi-part-1 | 715 | 21575 | 331.4 |
| 21 | pericles | 605 | 18282 | 330.9 |
| 22 | troilus-and-cressida | 837 | 25810 | 324.3 |
| 23 | titus-andronicus | 659 | 20621 | 319.6 |
| 24 | alls-well-that-ends-well | 724 | 22683 | 319.2 |
| 25 | measure-for-measure | 693 | 21858 | 317.0 |
| 26 | the-tempest | 518 | 16489 | 314.1 |
| 27 | the-comedy-of-errors | 455 | 14552 | 312.7 |
| 28 | the-two-noble-kinsmen | 735 | 23751 | 309.5 |
| 29 | julius-caesar | 592 | 19251 | 307.5 |
| 30 | as-you-like-it | 664 | 21692 | 306.1 |
| 31 | twelfth-night | 573 | 19675 | 291.2 |
| 32 | othello | 737 | 25670 | 287.1 |
| 33 | much-ado-about-nothing | 591 | 20843 | 283.5 |
| 34 | romeo-and-juliet | 677 | 23948 | 282.7 |
| 35 | the-merry-wives-of-windsor | 604 | 21603 | 279.6 |
| 36 | timon-of-athens | 504 | 18262 | 276.0 |
| 37 | the-taming-of-the-shrew | 449 | 18709 | 240.0 |
| 38 | the-two-gentlemen-of-verona | 404 | 17010 | 237.5 |
Ignoring the whole log-likelihood stuff and just looking at the simple frequencies, I'm not completely sure that I buy the article's argument. Macbeth does come out on top by my analysis. But some of the other plays seem to use "the" or "th'" nearly as frequently without being particularly creepy. In terms of ratios of the frequencies, Henry V, a history, is only 2.6% lower than Macbeth. And the first comedy, Love's Labors Lost, is just 5.2% lower.Like if we assumed that all English language is generated from a weighted distribution of all words and “the” is 3.5%, is a 4.3% occurrence rate even significant? (And what even would be the base occurrence rate?)
I’d also be interested in seeing if the 2:1 difference isn’t larger than for other authors?
| 1 | macbeth | 724 | 16929 | 252 |
| 2 | henry-v | 1065 | 25577 | 162 |
| 3 | coriolanus | 1126 | 27294 | 151 |
| 4 | loves-labors-lost | 855 | 21093 | 192 |
| 5 | henry-viii | 962 | 24074 | 165 |
| 6 | the-merchant-of-venice | 834 | 20985 | 189 |
| 7 | henry-iv-part-2 | 1001 | 25762 | 150 |
| 8 | henry-vi-part-2 | 990 | 25597 | 151 |
| 9 | hamlet | 1142 | 30006 | 126 |
| 10 | henry-iv-part-1 | 856 | 24100 | 147 |
| 11 | henry-vi-part-3 | 866 | 24491 | 144 |
| 12 | antony-and-cleopatra | 861 | 24465 | 143 |
| 13 | king-lear | 898 | 25661 | 136 |
| 14 | the-winters-tale | 854 | 24568 | 141 |
| 15 | king-john | 717 | 20730 | 166 |
| 16 | cymbeline | 959 | 27738 | 124 |
| 17 | a-midsummer-nights-dream | 564 | 16377 | 210 |
| 18 | richard-iii | 985 | 28914 | 117 |
| 19 | richard-ii | 753 | 22224 | 152 |
| 20 | henry-vi-part-1 | 715 | 21575 | 153 |
| 21 | pericles | 605 | 18282 | 181 |
| 22 | troilus-and-cressida | 837 | 25810 | 125 |
| 23 | titus-andronicus | 659 | 20621 | 154 |
| 24 | alls-well-that-ends-well | 724 | 22683 | 140 |
| 25 | measure-for-measure | 693 | 21858 | 145 |
| 26 | the-tempest | 518 | 16489 | 190 |
| 27 | the-comedy-of-errors | 455 | 14552 | 214 |
| 28 | the-two-noble-kinsmen | 735 | 23751 | 130 |
| 29 | julius-caesar | 592 | 19251 | 159 |
| 30 | as-you-like-it | 664 | 21692 | 141 |
| 31 | twelfth-night | 573 | 19675 | 148 |
| 32 | othello | 737 | 25670 | 111 |
| 33 | much-ado-about-nothing | 591 | 20843 | 136 |
| 34 | romeo-and-juliet | 677 | 23948 | 118 |
| 35 | the-merry-wives-of-windsor | 604 | 21603 | 129 |
| 36 | timon-of-athens | 504 | 18262 | 151 |
| 37 | the-taming-of-the-shrew | 449 | 18709 | 128 |
| 38 | the-two-gentlemen-of-verona | 404 | 17010 | 139 |
with only 17 and 27 breaking 200, and still well shy of 252.But the real point of the article is that the oddity of "the" in the frequency table attracted their attention to that word, and led them to identify an actual peculiarity in its usage. To say henry-v demonstrates anything similar, you would need to check if usage in that play is similarly peculiar (which I have not done either).
It seems odd to suggest (as some commenters have done) that the difference was subconscious. My null hypothesis is that peculiarities in usage by a professional wordsmith are deliberate. I expect to see actual evidence that the author didn't know what he was up to.
If we assume that length of a play has an influence on the frequency of stop words, shouldn’t we compare samples of each play? (First x pages or y randomly sampled words)
That said, although these days it is uncommon to use generic nouns with the definite article ("the"), I understand that this was a lot more common in Shakespeare's day. I wonder if this is more common in Macbeth than in Shakespeare's other plays, whether it was a deliberate choice, and whether Jacobean audiences would have felt the same sense of creepiness.
But I agree, we don't know what the real reason was for Shakespear's choice in this case, just that it explains what contributes to the creepiness for the modern readers.
> Macbeth is a creepy play.
But my brain wanted really badly to swap the emphasised word:
> Macbeth is a creepy play.
Edit: Maybe that was the point of the author. A couple of paragraphs later they say “Actors and critics have long remarked that when you read Macbeth out loud, it feels like your voice and mouth and brain are doing something ever so slightly wrong. There’s something subconsciously off about the sound of the play, and it spooks people.”
Example: https://jayshams.medium.com/moby-dick-is-not-a-novel-e19e41f...
Shakespeare's famous romantic scene does exactly the same thing: "it is the east, and Juliet is the sun." That's how he wrote.
> But fans of Macbeth often say its freaky qualities are deeper than just the plot devices and characters. For centuries, people been unsettled by the very language of the play.
> Actors and critics have long remarked that when you read Macbeth out loud, it feels like your voice and mouth and brain are doing something ever so slightly wrong. There’s something subconsciously off about the sound of the play, and it spooks people. It’s as if Shakespeare somehow wove a tiny bit of creepiness into every single line. The literary scholar George Walton Williams described the “continuous sense of menace” and “horror” that pervades even seemingly innocuous scenes.
> For centuries, Shakespeare fans and theater folk have wondered about this, but could never quite explain it.
The article claims to explain it - in the play, the word "the" is used a lot!
Um.. gee, in high school it was pointed out to me how constant themes in Macbeth are how unnatural things have become, how everything is strange, qualities/values reversed from normal - fair is foul and foul is fair etc, animals doing weird things, bad omens etc. It never stops, all the way through. I had to go through the play and list how many animals are mentioned, doing strange things. It's constant. People meet and it's not "Lovely day isn't it" but an anecdote about how so-and-so saw something incredibly weird and impossible happen. Over that background is the quickly escalating paranoia and madness of Macbeth & Lady Macbeth. Etc. Can't be bothered writing more, I didn't want to say just "This is total nonsense.", but it is. (Flagged.)
As noted in the article, the Scottish play displays other oddities. But this one had not been commented upon before. To demonstrate vacuity, you would need to identify other plays that do not come off as creepy, but use "the" in the way noted. For an academic presentation, we might expect the authors to have checked for it in other plays, but this article was not.