undefined | Better HN

0 pointswhimsicalism3y ago0 comments

Fastest way to show what? That you should train with the maximum sized LoRA you can? Because the only upsides to having a smaller LoRA are in the training time, and if you are already able to train a DynLoRA with max rank 8, then you should just train a LoRA with that rank.

0 comments

fancyfredbot3y ago

You get diminishing returns as you increase the rank, so with a fixed training budget it's not clear whether you get the best return from increasing rank vs increasing something else. If you start off by training DynLORA with max rank 8 you can see returns diminish fast beyond rank 5. Then you can use rank 5 for the rest of your training. You wouldn't know that with LoRA. I think this is the idea behind the paper. If you are just going to use your entire budget training a DyLoRA with max rank 8 then you're right there's no advantage over LoRA with rank 8. You'd have to use the ability to assess multiple ranks in order to see some benefit.

whimsicalismOP3y ago

I can see that. But are we sure that a rank-based difference that doesn't manifest early in the training process won't manifest as you get further along? See also 'grokking' [0]

[0]: https://arxiv.org/abs/2201.02177

fancyfredbot3y ago

Not sure there's any way to know beforehand whether that would happen but the advantage of DyLoRA is that at least you will know afterwards whether you really needed the full rank whereas with LoRA you wouldn't? In some cases that might not be valuable information but I guess you'd rather know than not.

Ldorigo3y ago

Why is the only advantage at training time? I might misunderstand something but with this method you can train once, and then deploy models that use arbitrary rank (according to end-users compute requirements) and expect to have a model that performs best for that specific rank.

j / k navigate · click thread line to collapse

0 comments

fancyfredbot3y ago

whimsicalismOP3y ago

I can see that. But are we sure that a rank-based difference that doesn't manifest early in the training process won't manifest as you get further along? See also 'grokking' [0]

[0]: https://arxiv.org/abs/2201.02177

fancyfredbot3y ago

Ldorigo3y ago

j / k navigate · click thread line to collapse