It's hard, but if you have a piece of fiction or non-fiction it hasn't seen before, then a deep reading comprehension question can be a good indicator. But you need to be able to separate a true answer from BS.
"What does this work says about our culture? Support your answer with direct quotes."
I found both gpt-4 and haiku to do alright at this, but sometimes give answers that imply fixating on certain sections of a 20,000 k context. You could compare it against chunking the text, getting the answer for each chunk and combining them.
I suspect if you do that then the chunking would win for things that are found in many chunks, like the work is heavy handed on a theme, but the large context would be better for a sublter message, except sometimes it would miss it altogether and think a Fight Club screenplay was a dark comedy.
Interpretation is hard I guess.
No comments yet.