It’s much more than just an LLM. The mamba architecture is often used in the
backbone of an LLM but you can use it more generally as a linear-time (as opposed to quadratic-time) sequence modeling architecture (as per the original paper’s title, which is cited in the linked repo). It is much closer to a convolutional network or an RNN (it has bits of both) than to a transformer architecture. It is based off the notion of state spaces (with a twist).
I use Mamba for instance to build surrogate models of physics-based building energy models which can generate 15-min interval data for heating, cooling, electricity, and hot water usage of any building in the US from building characteristics, weather timeseries, and occupancy time series.
It has many other non-NLP applications.