Is it possible to use this concept to keep a very long session? Tell it to forget things, or replace some part of the memory, without "rebooting" (starting another instance or conversation)?
So far, I'm unable to find any information on how to do that. In the OS analogy, the stuff I found looks more like putting stuff on autoexec.bat to open on the next boot than proper management of memory during execution.
It looks more like "autoexec.bat engineering". Is that the same overall idea?
That time to first token is really expensive, like a reboot. Any tool to reduce it and keep the model running for longer would be a real breakthrough, but I haven't found any practical examples of it, just stuff that does "autoexec.bat" analogues.
I could design an autoexec.bat to remember the programs that were opened after reboot, all automatically. If I open something, it goes there. If I close, I remove it from autoexec.bat. MacOS does this. But that's not really the persistence that saves me time and money. MacOS is good because _I rarely need to reboot it_, and the "reopen windows after reboot" option is barely used.
There's one question I placed there that perfectly encapsulates my doubts:
_Can I use this "context engineering" to mitigate the costs of the time for first token?_
If I cannot, then it's just like rebooting an OS, and it is merely the illusion of persistance. I can totally do this on my own just like I can craft hacky autoexec.bat scripts, nothing special about it.
I've seen attempts at doing "snapshotting" of parts of a GPU memory, which are similar to pausing a VM after boot and then restoring it. That's also not what I'm talking about, and it is just an optimization on the process of rebooting and does not improve much on the time for first token (there's a time penalty either way).