-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] POC: Run RLlib w/o Preprocessors setup. #17656
[RLlib] POC: Run RLlib w/o Preprocessors setup. #17656
Conversation
…le_batch_supports_complex_spaces
…eprocessors_soft # Conflicts: # rllib/policy/sample_batch.py
…le_batch_supports_complex_spaces
…le_batch_supports_complex_spaces # Conflicts: # rllib/policy/sample_batch.py
…le_batch_supports_complex_spaces
…le_batch_supports_complex_spaces
…le_batch_supports_complex_spaces
…eprocessors_soft # Conflicts: # rllib/agents/trainer.py # rllib/utils/annotations.py
…ecate_preprocessors_soft # Conflicts: # rllib/agents/trainer.py
…ecate_preprocessors_soft # Conflicts: # rllib/evaluation/collectors/simple_list_collector.py
…ecate_preprocessors_soft
…ecate_preprocessors_soft # Conflicts: # rllib/evaluation/collectors/simple_list_collector.py
rllib/execution/replay_buffer.py
Outdated
@@ -402,7 +402,7 @@ def add_batch(self, batch: SampleBatchType) -> None: | |||
# If SampleBatch has prio-replay weights, average | |||
# over these to use as a weight for the entire | |||
# sequence. | |||
if "weights" in time_slice: | |||
if "weights" in time_slice and time_slice["weights"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid np.mean over empty list (yields: NaN).
…ecate_preprocessors_soft
…ecate_preprocessors_soft
@@ -416,6 +416,8 @@ def postprocess_nstep_and_prio(policy: Policy, | |||
batch[SampleBatch.REWARDS], batch[SampleBatch.NEXT_OBS], | |||
batch[SampleBatch.DONES]) | |||
|
|||
# Create dummy prio-weights (1.0) in case we don't have any in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prob should be in diff PR
# env's action space before sending actions back to the env. | ||
# (0.0 centered with small stddev; only affecting Box components). | ||
# We will unsquash actions (and clip, just in case) to the bounds of | ||
# the env's action space before sending actions back to the env. | ||
"normalize_actions": True, | ||
# If True, RLlib will clip actions according to the env's bounds | ||
# before sending them back to the env. | ||
# TODO: (sven) This option should be obsoleted and always be False. | ||
"clip_actions": False, | ||
# Whether to use "rllib" or "deepmind" preprocessors by default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe describe what rllib or deepmind does in the comment.
self.buffers[SampleBatch.AGENT_INDEX][0].append(agent_index) | ||
self.buffers[SampleBatch.ENV_ID][0].append(env_id) | ||
self.buffers[SampleBatch.T][0].append(t) | ||
self.buffers[SampleBatch.EPS_ID][0].append(self.episode_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If self.episode_id and unroll_id is constant, why repeat adding the same data?
if SampleBatch.EPS_ID in values: | ||
assert values[SampleBatch.EPS_ID] == self.episode_id | ||
del values[SampleBatch.EPS_ID] | ||
self.buffers[SampleBatch.EPS_ID][0].append(self.episode_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue above
shape=shape, | ||
name=".".join([str(p) for p in path]), | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do?
This PR was motivated in preparation for soon allowing individual observation components to be addressed by the trajectory view API, for example to enable frame-stacking for individual observation components within a complex observation space (Tuple|Dict). Also, soon soft-deprecating RLlib's Preprocessor API should increase transparency for the users and allow batched, model-based preprocessing of observations. Observations will arrive in the model exactly as they are returned by the env.
This PR is a POC that works for tf and torch.
preprocessor_pref
: None; Set to None for disabling Preprocessors altogether (not even useNoPreprocessor
class anymore).Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.