I'm referring to the statistical power of the model. For example, if you replace GPT4 with GPT2 it will lose every game, because the statistical power is lower. Increasing the statistical power doesn't make the model understand any better, it just makes it more likely to generate a response that aligns with human expectations.