You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 7, 2024. It is now read-only.
I am observing a strange behavior by the tensorflow default boot dqn agent that I am a bit baffled by.
When running sweeps over multiple environments, the agent loses its expected behavior after the first iteration and does not seem to explore. I've tried to debug for some time but haven't figured out the cause.
Code for reproduction (double-checked in a newly installed env):
import bsuite
from bsuite.baselines.tf import boot_dqn
from bsuite import sweep
from bsuite.baselines import experiment
bsuite_id = "DEEP_SEA"
log_dir = "./logs/"
bsuite_sweep = getattr(sweep, bsuite_id)[:3]
for id in bsuite_sweep:
env = bsuite.load_and_record(id, save_path=log_dir, overwrite=True)
agent = boot_dqn.default_agent(
obs_spec=env.observation_spec(),
action_spec=env.action_spec(),
)
experiment.run(agent, env, num_episodes=300)
Iterations 2 and 3 do not reach the end of the chain in 300 episodes and neither in very long training horizons (see also the colab link for results).
In contrast, the jax agent produces the expected results reliably in this loop (i.e., by replacing <bsuite.baselines.tf> with <bsuite.baselines.jax>).
The text was updated successfully, but these errors were encountered:
anyboby
changed the title
Tensorflow BOOT DQN agent loses performance after iterations
Tensorflow BOOT DQN agent loses performance after first iteration
Jan 5, 2023
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi,
I am observing a strange behavior by the tensorflow default boot dqn agent that I am a bit baffled by.
When running sweeps over multiple environments, the agent loses its expected behavior after the first iteration and does not seem to explore. I've tried to debug for some time but haven't figured out the cause.
Code for reproduction (double-checked in a newly installed env):
Iterations 2 and 3 do not reach the end of the chain in 300 episodes and neither in very long training horizons (see also the colab link for results).
In contrast, the jax agent produces the expected results reliably in this loop (i.e., by replacing <bsuite.baselines.tf> with <bsuite.baselines.jax>).
The same can be observed in colab:
https://colab.research.google.com/drive/1hnJMDLG-aXCKKsjFqVd6YWGY4luz29ku?usp=sharing
best,
anyboby
The text was updated successfully, but these errors were encountered: