> There's nothing in the idea that forbids immediate mode UI frameworks from keeping any amount of internal state between frames to keep track of changes over time
When you call ImGui::Button("Hello World") it has no way of telling if it’s the same button as on the previous frame, or a new one. You’re simply not providing that data to the framework.
It’s often good enough for rendering, they don’t really care if it’s an old or new button as long as the GPU renders identical pixels. When you need state however (animations certainly do), the approach is no longer useful.
A framework might use heuristics like match text of the button or relative order of things, but none of them is reliable enough. Buttons may change text, they may be dynamically shown or hidden, things may be reordered in runtime.
> Layout problems can be solved by running several passes over the UI description before rendering happens.
You’re correct that it’s technically solvable. It’s just becomes computationally expensive for complex GUIs. You can’t reliably attach state to visual elements, gonna have to re-measure them every frame.
> There need to be accessibility APIs which let user interface frameworks connect to screen readers
Windows has it for decades now. For Win32 everything is automatic and implemented by the OS. Good custom-painted GUI frameworks like WPF or UWP handle WM_GETOBJECT message and return IAccessible COM interface. That COM interface implements reflection of the complete GUI, exposing an objects hierarchy. I’m not saying it’s impossible to do with immediate rendering, just very hard without explicit visual tree that persists through the lifetime of the window.