-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Check (Discrete actions) #49
Comments
Initial results with PPO: Seems to mostly match performance of SB PPO2 but with some glaring errors (see training runs with six games with bit different action spaces). It seems that at least few games should be used for evaluation, because in some sb3 version gets similar performance (e.g. MsPacman, Q*bert), but in others it does not reach same numbers (e.g. Breakout, Enduro). I still have double-check the parameters were right etc. |
Are you using the zoo? And if so, which wrapper? |
No Zoo, based on this code. These are copied and modified wrappers from SB. The only thing that changes between SB and SB3 runs is where algorithm is imported from, rest is handled by the other code (and is the same). |
Cross Posting:
This is observed with the latest version of DQN. |
Hello all, I've been an avid SB1 user for over a year. An amazing framework with thorough documentation and active support group indeed. |
I will work on DQN next. Could you share what envs/settings you used to get stuck like that with "standard" setup? This is on the suggestions list for v1.2, I believe. At the moment we are working on optimizing the performance even of the synchronous variants, and PyTorch is not making things too easy with its tendency to use too many threads at the same time etc :) |
Completely understand @Miffyli . Would it be helpful to review https://github.com/alex-petrenko/sample-factory |
The script that runs the DQN agent: from stable_baselines3 import DQN
import argparse
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--lr","--learning-rate", type=float, default=1e-4, dest="learning_rate")
parser.add_argument("env", type=str)
parser.add_argument("--policy", default="MlpPolicy")
parser.add_argument("--policy-kwargs", type=eval, default={})
parser.add_argument("--buffer-size", type=int, default=int(1e5))
parser.add_argument("--learning-starts", type=int, default=5000)
parser.add_argument("--batch-size", default=32, type=int)
parser.add_argument("--tau", type=float, default=1.0)
parser.add_argument("--gamma", default=0.99, type=float)
parser.add_argument("--train-freq", type=int, default=4)
parser.add_argument("--gradient-steps", type=int, default=-1)
parser.add_argument("--n-episodes-rollout", type=int, default=-1)
parser.add_argument("--target-update-interval", type=int, default=5000)
parser.add_argument("--exploration-fraction", type=float, default=0.2)
parser.add_argument("--exploration-initial-eps", type=float, default=1.0)
parser.add_argument("--exploration-final-eps", type=float, default=0.05)
learn = argparse.ArgumentParser()
learn.add_argument("--n-timesteps", default=int(5e5), type=int, dest="total_timesteps")
learn.add_argument("--eval-freq", type=int, default=10)
learn.add_argument("--n-eval-episodes", type=int, default=5)
agent_args, learn_args = parser.parse_known_args()
learn_args = learn.parse_args(learn_args)
agent = DQN(**agent_args.__dict__, verbose=2, create_eval_env=True, tensorboard_log=f"tb/dqn_{agent_args.env}")
agent.learn(**learn_args.__dict__) The script that I call the above with:
The hyper parameters (except lr) are taken from the zoo. |
Refactor buffers
The discrete action counterpart of #48
Associated PR: #110
Test envs: Atari Games (Pong - easy, Breakout - medium, ...)
The text was updated successfully, but these errors were encountered: