Reinforcement Learning as a fine-tuning paradigm (opens in new tab)

(ankeshanand.com)

22 pointsankeshanand4y ago7 comments

7 comments

The fruits of massive language modeling are coming to RL. I envision such foundation models becoming cheap and standardized, like an AI operating system. If we could have a cheap, compact, multi-modal GPT-3 chip we could make all sorts of agents run on top. These RL agents would be like the libraries of skills in Matrix, you can load any skill you want on the player.

YetAnotherNick4y ago

In India without VPN:

"The website has been blocked as per order of Ministry of Electronics and Information Technology under IT Act, 2000."

ankeshanandOP4y ago

Looks like any Github pages served with CloudFlare are getting blocked, I am trying out a fix.

armanboyaci4y ago

> Other learning paradigms are about minimization; reinforcement learning is about maximization.

I don't see why this is important.

Matumio4y ago

I think they wanted to express that learning to predict the correct output ("error minimization") puts a limit on the achievable performance. While ranking (not just RL, really) allows to improve beyond the current best-known answer.

patresh4y ago

Also the next point

> It should have (and has shown to have) better scaling laws

is a statement based on two anecdotes but I don't see a compelling reason why this should be the case in general.

Active learning approaches are not mentioned even though they allow incorporating human feedback during the fine-tuning process and this can be done with a purely supervised approach.

IMO the last point is the only compelling one : having for example agents that can browse the web during learning could open a lot of possibilities. It would have been interesting to develop this last point more : what are the current difficulties in training such agents?

ankeshanandOP4y ago

It's important in the context that RL does not have performance ceilings.

j / k navigate · click thread line to collapse

7 comments

visarga4y ago

YetAnotherNick4y ago

In India without VPN:

"The website has been blocked as per order of Ministry of Electronics and Information Technology under IT Act, 2000."

ankeshanandOP4y ago

Looks like any Github pages served with CloudFlare are getting blocked, I am trying out a fix.

armanboyaci4y ago

> Other learning paradigms are about minimization; reinforcement learning is about maximization.

I don't see why this is important.

Matumio4y ago

patresh4y ago

Also the next point

> It should have (and has shown to have) better scaling laws

is a statement based on two anecdotes but I don't see a compelling reason why this should be the case in general.

Active learning approaches are not mentioned even though they allow incorporating human feedback during the fine-tuning process and this can be done with a purely supervised approach.

ankeshanandOP4y ago

It's important in the context that RL does not have performance ceilings.

j / k navigate · click thread line to collapse