I don't understand exactly how that would work. At some point, the generation would introduce new events and characters, new places or objects, and name them, but then when summarising, won't the names of some of them be lost, just because there's not enough space in the summary to name them all? The same goes for all sorts of detail, not necessarily named. At that point, what happens to the narrative about those forgotten characters, objects, etc?
The main idea, of continuously feeding the model a summary of its generation (and its dialog with the user of course) sounds interesting, but it's still not a memory. At some point, the continuous summarisation will have to grow big enough that it again exceeds the system's buffer (its "short term memory"). Either that, or it will drop so much detail from the summary that it will lose the plot.
So while this may result in longer generations, it doesn't look like it will really solve the problem of "long term memory", or long-distance dependency. It's a smart trick, but that's not enough.