Transformers have a sequence context, but it constructs its own context dependent notion of orderliness with attention.
Persistent or recurrent activation states can extend the context window past the current tokenizing limitations. Better still would be dynamic construction where new knowledge can be carefully grafted into a network without training, and updates over the recurrent states feeding back into modifying learned structures.
Spiking networks might provide a clear architecture to achieve some of those goals, but it's really just recurrence shuffled around different stages of processing.