undefined | Better HN

0 pointssitkack2y ago0 comments

What is the fastest way to show that?

0 comments

Fastest way to show what? That you should train with the maximum sized LoRA you can? Because the only upsides to having a smaller LoRA are in the training time, and if you are already able to train a DynLoRA with max rank 8, then you should just train a LoRA with that rank.

fancyfredbot2y ago

You get diminishing returns as you increase the rank, so with a fixed training budget it's not clear whether you get the best return from increasing rank vs increasing something else. If you start off by training DynLORA with max rank 8 you can see returns diminish fast beyond rank 5. Then you can use rank 5 for the rest of your training. You wouldn't know that with LoRA. I think this is the idea behind the paper. If you are just going to use your entire budget training a DyLoRA with max rank 8 then you're right there's no advantage over LoRA with rank 8. You'd have to use the ability to assess multiple ranks in order to see some benefit.

whimsicalism2y ago

I can see that. But are we sure that a rank-based difference that doesn't manifest early in the training process won't manifest as you get further along? See also 'grokking' [0]

[0]: https://arxiv.org/abs/2201.02177

1 more reply

Ldorigo2y ago

Why is the only advantage at training time? I might misunderstand something but with this method you can train once, and then deploy models that use arbitrary rank (according to end-users compute requirements) and expect to have a model that performs best for that specific rank.

j / k navigate · click thread line to collapse

0 comments

whimsicalism2y ago

fancyfredbot2y ago

whimsicalism2y ago

I can see that. But are we sure that a rank-based difference that doesn't manifest early in the training process won't manifest as you get further along? See also 'grokking' [0]

[0]: https://arxiv.org/abs/2201.02177

1 more reply

Ldorigo2y ago

j / k navigate · click thread line to collapse