You can run PPO or DQN right now on the Open AI Gym implementation using Stable-Baselines3: https://stable-baselines3.readthedocs.io/en/master/
In fact I previously ran it locally and PPO solved the problem within 10 minutes of training with max reward of about 200.