Section 2 of the blog post is no longer very relevant. A lot of advances (DSS, S4D) simplified that part of the process. Arguably also this all should be updated for Mamba (same authors).
I still find it really confusing how a linear model can perform so well.
Some of concepts are better explained here than anywhere else, and make it straightforward to make sense of Mamba, which is increasingly popular.
S4D: On the Parameterization and Initialization of Diagonal State Space Models (https://arxiv.org/abs/2206.11893)