Fair enough, and it's easy to test :)
I made a non-iterative DFS, but hardcoded the maximum depth to the optimal solution. On a trivial puzzle that should take <1ms, it takes 6 seconds. The BFS visits 303 states, the DFS 9M (but only 237 unique ones). This is with deduping the states on the path. If we allow the same state on the path multiple times, it'd be 12s, 50M visited states, 239 unique states. This is with a random visiting order, it'd be a bit worse with a static visiting order.
This is obviously not fully optimized code (whee, std::unordered_set). But fixing that won't help when we're off by 3-4 orders of magnitude. I think the shape of the search graph of this game just isn't well suited to any form of DFS, there are far too many alternate paths to each state.