This article stated the opposite, gpt-4 couldn't play chess while gpt-3.5 could. So this is a case where the model got dumber.