-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmark.md agents hyperparameters #38
Comments
Hello, For each trained agent, you have a Ex for TD3 on HalfCheetathBulletEnv-v0 Note: this was not present in the early versions of the rl zoo, where you need to look at the yaml files in that case. Ex for A2C on atari games Please note that this is not a proper benchmark, in the sense that the reported values correspond to only one seed. This more made to check algorithm (maximal) performance, find potential bugs and also people to have pretrained agents available. |
Okay, I see. |
Those are present in
There are config files for each one of those in the corresponding folder. Note: |
So, could you please confirm, that I got all hyperparameters right? atari-dqn (MsPacmanNoFrameskip and EnduroNoFrameskip)
ddpg (BipedalWalker-v2 and BipedalWalkerHardcore-v2)
sac (BipedalWalker-v2 and BipedalWalkerHardcore-v2)
overall
Thanks! |
The benchmark is done only at the end of training. The number of training timesteps is also in the config file, for atari, it is the standard 10M steps (so 40M steps in the real env because of the frame skip), for the others, check the config files.
yes
yes
Looks good, note that this is a Prioritized Double dueling dqn.
does not look the ones found in https://github.com/araffin/rl-baselines-zoo/blob/master/trained_agents/ddpg/BipedalWalker-v2/config.yml Yes, you will need several seeds to have a good one with DDPG. Also, it did not manage to make it work with the HardCore version yet.
for SAC, the learning rate for linearly annealed (it helps to avoid catastrophic drop in performance)
it is 10e6 in the config file...
It is done by SAC automatically using the stochastic policy. |
Thanks for reply, now it looks much more realistic :) . |
yes, I could fix either the number of episodes or the number of steps, I chose the latter. |
Hi,
Thanks for amazing lib, open-source RL benchmark is really valuable nowadays.
Nevertheless, I am wondering, where I can find hyperparameters used for benchmarked agents? Like network architecture, optimizator parameters and other important RL stuff ;)
The text was updated successfully, but these errors were encountered: