Skip to content

Top New Best Ask Show Jobs

Generalized on-policy distillation with reward extrapolation | Better HN

Generalized on-policy distillation with reward extrapolation (opens in new tab)

(arxiv.org)

3 pointsfzliu1mo ago0 comments

0 comments

No comments yet.