> We can take to gain orders of magnitude more performance just like the leap that the Transformers paper had.
Afaik the most important benefit of transformers aren't their “performance” (in the sense of ability to perform their tasks) but their scalability which come from their ability to be trained and evaluated efficiently on big GPU clusters, which isn't something you can do with recurrent neural networks.
And then, if I understood correctly, the benefit of state-space models being that you can train them in parallel and run them in a recurrent fashion, making inference cheaper than transformers especially when context size grow.