So a RL chess algorithm tells your statistically a move (action) from a state S to a new state S’ such that you are expected to maximize your reward. Whereas a chessmaster (probably) designs his next sequence of moves based on logic (my opponent will respond in such a way because etc). This is different from « statistically, this move right now has the best odds of leading to a win » a la monte carlo. Now what is surprising, is that statistical algos are better than our best logicians at this particular task. But its the action at a given state is still statistically designed.
Finally, you need your data mining to be representative of the underlying distribution you are trying to model. So you need your simulator to be the most real whereas they are in fact approximations in most useful cases (landing a plane for instance).
So for instance if you want an algo to design the flight path of a rocket landing on an asteriod, you could recreate a simulator modeling spacetime from observations and model its dynamics from eintein’s equations, but then what’s the RL for, why not just use an off the shelf optimization algorithm like we have for decades? [1]
The bellman equation and DQNs are nice and all, but they’re still statistical algorithms, producing - in my mind - statistical intelligence about a particular system. An RL agent will not tell you WHY such an action was taken, but it’ll tell you that statistically, it is the action to take.
Very neat results in RL however.
[1] i worked on a RL based agent to control trafic lights, and it wasnt clear whether our solution was better than a classical optimization one. Actually, classical optimization (minimizing an analytical model of the system) seemed to scale much better to larger meshes.