undefined | Better HN

0 pointsbubblyworld1y ago0 comments

Right, my thought was that this would be way too slow for episode rollout (versus an accelerated implementation in jax or something), but I guess not!

0 comments

wegfawefgawefg1y ago

well thats the golden issue with rl, sample efficiency. it is env bounded, so you want an architecture that extracts the max possible information from each collected sample, avoiding catastrophic forgetting, prioritizing samples according to relevance

j / k navigate · click thread line to collapse

0 comments

wegfawefgawefg1y ago

j / k navigate · click thread line to collapse