The #1 misconception when working with large language models is thinking that a capability is a property of the model, rather than the model + input. It may be simultaneously true that ChatGPT has an elo of 100 when given a conversational message and an elo of 1400 when given an optimized message (e.g., strings that resemble chess games, with many examples present in the conversation).
Understanding this concept is crucial for getting good results out of large language models.