This needs to be shown. For example, asking for something that is clearly in the training data (like Paul Grahams cv) is certainly not a proper way to test context recall
That is the point. Long book, checking the long context to see if remembers about the first sentence. Or you mean as a test it is better to randomly place the "needle"?