-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add wrappers for TorchRL training workflow #1178
base: main
Are you sure you want to change the base?
Conversation
Can Vincent Moens also get tagged as a reviewer? |
In my opinion, non-wrapper code (e.g.:
Having the PPO runner code in extension will generate non-Isaac Lab related version changes and changelog records when the code need to be modified. |
This reverts commit c896e12.
I realized this causes issues with the environment specific PPO configurations, which needs to import the TorchRL PPO runner cfgs. I can move only the configs to |
Hey @fyu-bdai , can you clarify why a single environment doesn't work? |
I took another look again and fixed the single env training bug with TorchRL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very quick review, happy to discuss each of these points
from typing import TYPE_CHECKING | ||
|
||
import wandb | ||
from torchrl.data.tensor_specs import CompositeSpec, UnboundedContinuousTensorSpec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to use Composite
and Unbounded
- we're deprecating UnboundedContinuousTensorSpec
|
||
import wandb | ||
from torchrl.data.tensor_specs import CompositeSpec, UnboundedContinuousTensorSpec | ||
from torchrl.envs.libs.gym import GymEnv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from torchrl.envs.libs.gym import GymEnv | |
from torchrl.envs import GymEnv |
""" | ||
if self._simple_done: | ||
done = tensordict._get_str("done", default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done = tensordict._get_str("done", default=None) | |
done = tensordict.get("done", default=None) |
could be a bit more robust to bc-breaking changes (even though I don't expect any). The overhead is low and can be swallowed by compile
@@ -0,0 +1,58 @@ | |||
# Copyright (c) 2022-2025, The Isaac Lab Project Developers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably want to look at these tutos
https://pytorch.org/rl/main/tutorials/export.html
https://pytorch.org/tensordict/main/tutorials/export.html
"""Checks the done keys of the input tensordict. Unlike the base GymWrapper implementation, we do not | ||
call env.reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
self._curr_ep_len[new_ids] = 0 | ||
self._curr_reward_sum[new_ids] = 0 | ||
tensordict.set("episode_length", self._ep_len_buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too we could use StepCounter
from omni.isaac.lab.envs import ManagerBasedRLEnv | ||
|
||
|
||
class TorchRLEnvWrapper(GymWrapper): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can get around most of the extra features here with torchrl transforms and by telling GymWrapper that the env is auto-resetting - happy to help getting this done.
That will be more robust long term, as this implementation relies on a bunch of private features of torchrl that we could modify in the future.
# The portion of the code handling cuda streams has been removed in this inherited method, which | ||
# caused CUDA memory allocation issues with IsaacSim during env stepping. | ||
total_frames = self.total_frames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha I think we can solve this by providing a no_stream
arg in the collector - happy to make it a feature if you think that would help
yield tensordict_out.clone() | ||
|
||
|
||
class ClipPPOLossWrapper(ClipPPOLoss): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment on what has been changed here?
There are many versions of PPO and we probably don't cover exactly what you want it to do but that can be fixed!
return td_out | ||
|
||
|
||
class TrainerWrapper(Trainer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trainer class hasn't been touched in a long time, as discussed I'm considering subclassing it for the various losses we have starting with PPO
Description
Adds TorchRL module wrappers, PPO runner, and PPO runner cfg for training IsaacLab environments with TorchRL.
This PR is the first in a series of three that together, adds a complete training pipeline for the Anymal-D training environment using torchrl. This PR contains core wrapper modules that should be merged first.
Related PRs:
#1179 Adds torchrl play and train scripts.
#1180 Adds Anymal-D torchrl training configuration.
Fixes #1181
Type of change
Checklist
pre-commit
checks with./isaaclab.sh --format
config/extension.toml
fileCONTRIBUTORS.md
or my name already exists there