AlphaProof constructed a large curriculum of tasks by randomly misformalizing math olympiad problems; creating many problems of varying difficulty, from trivial to full math olympiad difficulty.
Wow, the 'task mutations' idea is briliant! Your insights are always so sharp.
Regarding 11: https://arxiv.org/abs/2210.10760 Scaling Laws for Reward Model Overoptimization, Gao et al, 2022.
Wow, the 'task mutations' idea is briliant! Your insights are always so sharp.
Regarding 11: https://arxiv.org/abs/2210.10760 Scaling Laws for Reward Model Overoptimization, Gao et al, 2022.