I've seen first hand how well it works in a study with 40k kids in the US, Norway, and France. The study was led by the Center for Game Science, Washington University.
Paper: Zoran Popović. Achieving 96% Mastery at National Scale through Inspired Learning and Generative Adaptivity. Proceedings of the Second (2015) ACM Conference on Learning@ Scale, 2015.
http://dl.acm.org/ft_gateway.cfm?id=2724684&ftid=1550376&dwn...
Post-test success as used in the paper you mentioned can be a misleading indicator if transfer to pen and paper was done improperly. As kids solving equations in a game are moving terms around by touch on a screen, they need to transfer that learning to pen and paper before being able to write a line by line solution. Failing to do this transfer would predictably lead to a poor post-test performance, without actually measuring properly how much algebra had been learned and understood.