This is nothing to do with SSE. It's trivial to persist state over disconnects and refresh with SSE. You can do all the same pub sub tricks.
None of theses companies are even using brotli on their SSE connection for 40-400x compression.
It's just bad engineering and it's going to be much worse with web sockets. Because, you have to rebuild http from scratch, compression is nowhere near as good, bidirectional nukes your mobile battery because of the duplex antenna, etc, etc.
So the only thing you get from websockets is bidirectional events (at the cost of all the production challenges websockets bring). In practice most problems don't need that feature.
Go to ChatGPT.com while logged in, start typing right away, 8 words into typing it clears the text in the form. Why?
Very weird that the foundational LLM companies' own chat pages don't do this.
Dunno, in my Go+HTMX project, it was pretty trivial to add SSE streaming. When you open a new chat tab, we load existing data from the DB and then HTMX initiates SSE streaming with a single tag. When the server receives a SSE request from HTMX, it registers a goroutine and a new Go channel for this tab. The goroutine blocks and waits for new events in the channel. When something triggers a new message, there's a dispatcher which saves the event to the DB and then iterates over registered Go channels and sends the event to it. On a new event in the tab's channel, the tab's goroutine unblocks and passes the event from the channel to the SSE stream. HTMX handles inserting new data to the DOM. When a tab closes, the goroutine receives the notification via the request's context (another Go primitive), deregisters the channel and exits. If the server restarts, HTMX automatically reopens the SSE stream. It took probably one evening to implement.
Wasn't that complex!
Or is it that all your tokens go through a DB anyway?
It's fairly easy to keep an agent alive when a client goes away. It's a lot harder to attach the client back to that agents output when the client returns, without stuffing every token though the database.
CRDT or OT will work great but are even overkill. But so many of the edge cases you'd usually need to think about just disappear.
(I've built an agent / chat that used CRDT to represent the chat. You can have an arbitrary number of tabs, closing/opening at any time. All real time, in sync.)
Some are using Google Gemini.
It saves your chats, which are presented in a pane you can expand on the left and search. You can jump back into any chat and continue it, or delete individual chats.
This history is attached to your Google account, not to the chat window. You can pick up an existing chat in another browser on another device where you are authenticated with the same Google identity.
Now about the specific use scenario in this article (hitting refresh immediately after submitting a prompt, while the response is coming). Not sure why that would be important?
I just tried it several times. Both times, it initially appeared as if the Gemini interface lost the chats, since they didn't appear in the chat history section of the left pane. But after another refresh, they appeared. So there is just some delay.
Anyway, it's good in this regard beyond giving a damn.
Yes of course all chat providers store your chats, and they will be available eventually when the response has finished streaming and has been dumped to a db.
This is about live streaming getting lost and not being reconnected (and restreamed) when you refresh the page.
And since chatting with AI and seeing the responses streamed is a major usecase, the author was correct to question why eg Anthropic wouldn invest some of the 30B in fixing this glaring problem.
Esp since it looks like your initial message was not received by the backend server at all!
It may not be super criticsl, but it's like saying "my ferrari sometimes shows the wrong speed. it's still driving, but the speedometer is stuck. it does get back to the correct speed eventually though, so no biggie"