undefined | Better HN

0 pointsszvsw1y ago0 comments

It’s much more than just an LLM. The mamba architecture is often used in the backbone of an LLM but you can use it more generally as a linear-time (as opposed to quadratic-time) sequence modeling architecture (as per the original paper’s title, which is cited in the linked repo). It is much closer to a convolutional network or an RNN (it has bits of both) than to a transformer architecture. It is based off the notion of state spaces (with a twist).

I use Mamba for instance to build surrogate models of physics-based building energy models which can generate 15-min interval data for heating, cooling, electricity, and hot water usage of any building in the US from building characteristics, weather timeseries, and occupancy time series.

It has many other non-NLP applications.

0 comments

Ddav1y ago

Would love to hear more about that building energy modelling example, have you done a writeup you could share?

szvswOP1y ago

The Mamba application is my current research project so I haven’t published anything yet. But the basic idea is to create a latent representation of the static features, repeat the latent vector to form a time series, concatenate with the weather/occupancy time series, run through mamba layers, and bob’s your uncle. Shoot me an email (in my bio) if you would like to chat more!

I can also share my master’s thesis which is similar but using CNN layers rather than Mamba and only for monthly predictions rather than 15-min interval data. There are some other architectural differences but the basics are the same. That work is also globally robust.

As you can imagine, the current work I am doing at a much higher resolution is a big step up, and Mamba so far is working out great.

ahmadmijot1y ago

Can I see your thesis?

I'm currently learning about machine learning and digital twin but don't really where to start

blagie1y ago

Is there a good, easy tutorial?

j / k navigate · click thread line to collapse