Its distinct, but not very- its an EMA without assuming uniform time. The stability of EMA has nothing to do with integrators in control theory and neither do these models.
These models aren't really RNNs- they have only a linear gate which cannot depend on previous tokens at this layer, so they cant update their state in a way which depends on the current state very much.