undefined | Better HN

0 pointswhimsicalism2y ago0 comments

Yes to the first.

To the second, I'm not sure that the RAM requirements are the same to train because you have to preserve the state which takes extra memory.

0 comments

alchemist1e92y ago

Is it possible for many people to simultaneously fine tune models on different data and then combine the new models into something improved?

fancyfredbot2y ago

One approach is to have the model learn to select between several separately fine tuned adapters by learning which adapter works best in a given context. So at any given time it's only really using one adapter but can switch to another. In this case one adapter can't really improve another but the overall impact might be a model which is improved in a variety of different contexts.

yorwba2y ago

Yes, but the naïve way to combine rank k adaptations created by n different people would be to concatenate them to a rank nk adaptation, which wouldn't be as lightweight and easy to share, so you'd likely be better off mushing them into the baseline model.

alchemist1e92y ago

Can they mathematically be “mushed” and then create an improved model?

I have yet to understand the difference between fine tuning and training and therefore yet to understand if a distributed decentralized eventually consistent training approach is a possibility or simply not realistic.

1 more reply

j / k navigate · click thread line to collapse

0 comments

alchemist1e92y ago

Is it possible for many people to simultaneously fine tune models on different data and then combine the new models into something improved?

fancyfredbot2y ago

yorwba2y ago

alchemist1e92y ago

Can they mathematically be “mushed” and then create an improved model?

1 more reply

j / k navigate · click thread line to collapse