Add wrappers for TorchRL training workflow #1178

fyu-bdai · 2024-10-08T02:04:22Z

Description

Adds TorchRL module wrappers, PPO runner, and PPO runner cfg for training IsaacLab environments with TorchRL.

This PR is the first in a series of three that together, adds a complete training pipeline for the Anymal-D training environment using torchrl. This PR contains core wrapper modules that should be merged first.

Related PRs:
#1179 Adds torchrl play and train scripts.
#1180 Adds Anymal-D torchrl training configuration.

Fixes #1181

⚠️ Not Supported

Empirical normalization.
Recurrent networks.
Training with Neptune logger.

Type of change

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist

I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

fyu-bdai · 2024-10-08T02:08:50Z

Can Vincent Moens also get tagged as a reviewer?

Toni-SM · 2024-10-08T13:27:10Z

In my opinion, non-wrapper code (e.g.: torchrl runner code) should be in the source/standalone/workflows folder.

source/standalone/workflows
├── torchrl
│   ├── ppo/      <-- code here
│   ├── cli_args.py
│   ├── play.py
│   └── train.py

Having the PPO runner code in extension will generate non-Isaac Lab related version changes and changelog records when the code need to be modified.

This reverts commit c896e12.

fyu-bdai · 2024-10-08T17:25:49Z

In my opinion, non-wrapper code (e.g.: torchrl runner code) should be in the source/standalone/workflows folder.
source/standalone/workflows
├── torchrl
│   ├── ppo/      <-- code here
│   ├── cli_args.py
│   ├── play.py
│   └── train.py
Having the PPO runner code in extension will generate non-Isaac Lab related version changes and changelog records when the code need to be modified.

I realized this causes issues with the environment specific PPO configurations, which needs to import the TorchRL PPO runner cfgs. I can move only the configs to omni.isaac.lab_tasks.utils.wrappers, but that would split the location of the torchrl runner and torchrl runner cfg. I have elected to move them back as they are for now.

jsmith-bdai · 2024-12-13T16:27:41Z

Hey @fyu-bdai , can you clarify why a single environment doesn't work?

fyu-bdai · 2025-01-09T19:18:07Z

Hey @fyu-bdai , can you clarify why a single environment doesn't work?

I took another look again and fixed the single env training bug with TorchRL.

vmoens

Very quick review, happy to discuss each of these points

vmoens · 2025-01-22T22:49:50Z

...sions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/torchrl/torchrl_ppo_runner.py

+from typing import TYPE_CHECKING
+
+import wandb
+from torchrl.data.tensor_specs import CompositeSpec, UnboundedContinuousTensorSpec


You may want to use Composite and Unbounded - we're deprecating UnboundedContinuousTensorSpec

vmoens · 2025-01-22T23:07:56Z

...sions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/torchrl/torchrl_ppo_runner.py

+
+import wandb
+from torchrl.data.tensor_specs import CompositeSpec, UnboundedContinuousTensorSpec
+from torchrl.envs.libs.gym import GymEnv


Suggested change

from torchrl.envs.libs.gym import GymEnv

from torchrl.envs import GymEnv

vmoens · 2025-01-22T23:09:55Z

...ions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/torchrl/torchrl_env_wrapper.py

+
+        """
+        if self._simple_done:
+            done = tensordict._get_str("done", default=None)


Suggested change

done = tensordict._get_str("done", default=None)

done = tensordict.get("done", default=None)

could be a bit more robust to bc-breaking changes (even though I don't expect any). The overhead is low and can be swallowed by compile

vmoens · 2025-01-26T21:56:48Z

source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/torchrl/exporter.py

@@ -0,0 +1,58 @@
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers.


you probably want to look at these tutos
https://pytorch.org/rl/main/tutorials/export.html
https://pytorch.org/tensordict/main/tutorials/export.html

vmoens · 2025-01-27T00:37:26Z

...ions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/torchrl/torchrl_env_wrapper.py

+        """Checks the done keys of the input tensordict. Unlike the base GymWrapper implementation, we do not
+        call env.reset()


interesting, we bumped into a similar problem with @luisenp and @teopir

We need a better handling of auto-resetting envs in torchrl.

How do you get the last values of a trajectory (if you can get them)?

vmoens · 2025-01-27T00:50:35Z

...ions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/torchrl/torchrl_env_wrapper.py

+
+        self._curr_ep_len[new_ids] = 0
+        self._curr_reward_sum[new_ids] = 0
+        tensordict.set("episode_length", self._ep_len_buf)


here too we could use StepCounter

vmoens · 2025-01-27T00:52:50Z

...ions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/torchrl/torchrl_env_wrapper.py

+from omni.isaac.lab.envs import ManagerBasedRLEnv
+
+
+class TorchRLEnvWrapper(GymWrapper):


I think we can get around most of the extra features here with torchrl transforms and by telling GymWrapper that the env is auto-resetting - happy to help getting this done.
That will be more robust long term, as this implementation relies on a bunch of private features of torchrl that we could modify in the future.

vmoens · 2025-01-27T00:55:37Z

...ions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/torchrl/torchrl_env_wrapper.py

+        # The portion of the code handling cuda streams has been removed in this inherited method, which
+        # caused CUDA memory allocation issues with IsaacSim during env stepping.
+        total_frames = self.total_frames


gotcha I think we can solve this by providing a no_stream arg in the collector - happy to make it a feature if you think that would help

vmoens · 2025-01-27T00:56:21Z

...ions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/torchrl/torchrl_env_wrapper.py

+                yield tensordict_out.clone()
+
+
+class ClipPPOLossWrapper(ClipPPOLoss):


Can you comment on what has been changed here?
There are many versions of PPO and we probably don't cover exactly what you want it to do but that can be fixed!

vmoens · 2025-01-27T00:57:00Z

...ions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/torchrl/torchrl_env_wrapper.py

+        return td_out
+
+
+class TrainerWrapper(Trainer):


The trainer class hasn't been touched in a long time, as discussed I'm considering subclassing it for the various losses we have starting with PPO

fyu-bdai added 4 commits October 7, 2024 21:43

add torchrl wrappers and runner

4da889f

run formatter

c00b008

update changelog

f1f3462

run formatter

59c2835

fyu-bdai requested review from Dhoeller19, Mayankm96, jsmith-bdai and kellyguo11 as code owners October 8, 2024 02:04

fyu-bdai mentioned this pull request Oct 8, 2024

Add standalone scripts to enable torchrl workflow #1179

Open

6 tasks

fyu-bdai changed the title ~~Add Wrappers for TorchRL training workflow~~ Add wrappers for TorchRL training workflow Oct 8, 2024

fix video uploading

cbb245d

fyu-bdai mentioned this pull request Oct 8, 2024

Add Anymal-D torchrl cfg #1180

Open

6 tasks

fyu-bdai added 2 commits October 8, 2024 12:57

remove torchrl runner from wrappers

c896e12

Revert "remove torchrl runner from wrappers"

3cff2da

This reverts commit c896e12.

fyu-bdai added 3 commits January 9, 2025 11:33

Merge branch 'main' into fyu/add_torchrl_wrappers

29cc4dc

fix robomimic dependency install

6c25890

fix single env training and update copyrights

e86977a

vmoens reviewed Jan 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add wrappers for TorchRL training workflow #1178

Add wrappers for TorchRL training workflow #1178

fyu-bdai commented Oct 8, 2024 •

edited

Loading

fyu-bdai commented Oct 8, 2024

Toni-SM commented Oct 8, 2024

fyu-bdai commented Oct 8, 2024

jsmith-bdai commented Dec 13, 2024 •

edited

Loading

fyu-bdai commented Jan 9, 2025

vmoens left a comment

vmoens Jan 22, 2025

vmoens Jan 22, 2025

vmoens Jan 22, 2025

vmoens Jan 26, 2025

vmoens Jan 27, 2025

vmoens Jan 27, 2025

vmoens Jan 27, 2025

vmoens Jan 27, 2025

vmoens Jan 27, 2025

vmoens Jan 27, 2025

	from torchrl.envs.libs.gym import GymEnv
	from torchrl.envs import GymEnv

	done = tensordict._get_str("done", default=None)
	done = tensordict.get("done", default=None)

		@@ -0,0 +1,58 @@
		# Copyright (c) 2022-2025, The Isaac Lab Project Developers.

		"""Checks the done keys of the input tensordict. Unlike the base GymWrapper implementation, we do not
		call env.reset()

		from omni.isaac.lab.envs import ManagerBasedRLEnv


		class TorchRLEnvWrapper(GymWrapper):

		yield tensordict_out.clone()


		class ClipPPOLossWrapper(ClipPPOLoss):

Add wrappers for TorchRL training workflow #1178

Are you sure you want to change the base?

Add wrappers for TorchRL training workflow #1178

Conversation

fyu-bdai commented Oct 8, 2024 • edited Loading

Description

⚠️ Not Supported

Type of change

Checklist

fyu-bdai commented Oct 8, 2024

Toni-SM commented Oct 8, 2024

fyu-bdai commented Oct 8, 2024

jsmith-bdai commented Dec 13, 2024 • edited Loading

fyu-bdai commented Jan 9, 2025

vmoens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fyu-bdai commented Oct 8, 2024 •

edited

Loading

jsmith-bdai commented Dec 13, 2024 •

edited

Loading