undefined | Better HN

0 pointsbradhilton1y ago0 comments

No meaningful changes to the hyperparameters, just changed the tasks per iteration to 16 and trained on the same first 16 training tasks each iteration.

We only tested this with the 14B model. You can see the run here:

https://wandb.ai/bradhilton/rl-experiments/runs/062

Performance peaked after 21 iterations at 45% accuracy instead of the final 59%, but still a significant increase on very few samples.

0 comments

pama1y ago

Thanks.

j / k navigate · click thread line to collapse

0 pointsbradhilton1y ago0 comments

No meaningful changes to the hyperparameters, just changed the tasks per iteration to 16 and trained on the same first 16 training tasks each iteration.

We only tested this with the 14B model. You can see the run here:

https://wandb.ai/bradhilton/rl-experiments/runs/062

Performance peaked after 21 iterations at 45% accuracy instead of the final 59%, but still a significant increase on very few samples.