undefined | Better HN

0 pointsCuriouslyC2y ago0 comments

This. Hyperparameter tuning and training include a lot of model specific black magic. Transformers have had time to mature, it'll take a while for other stuff to catch up even if they have a higher potential ceiling.

0 comments

koayon2y ago

Definitely agree that a lot of work going into hyperparameter tuning and maturing the ecosystem will be key here!

I'm seeing the Mamba paper as the `Attention Is All You Need` of Mamba - it might take a little while before we get everything optimised to the point of a GPT-4 (it took 6 years for transformers but should be faster than that now with all the attention on ML)

koayon2y ago

Another interesting one is that the hardware isn't really optimised for Mamba yet either - ideally we'd want more of the fast SRAM so that we can store more larger hidden states efficiently

j / k navigate · click thread line to collapse

0 comments

koayon2y ago

Definitely agree that a lot of work going into hyperparameter tuning and maturing the ecosystem will be key here!

koayon2y ago

Another interesting one is that the hardware isn't really optimised for Mamba yet either - ideally we'd want more of the fast SRAM so that we can store more larger hidden states efficiently

j / k navigate · click thread line to collapse