Yeah I also tried to get it to complete some limericks from the dataset. Curiously it believed it had heard of the limerick but would then recite a hallucination.
So the good news is that the NIAN score might be real, bad news is you can't rely on it to know what it knows.