Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wrappers for TorchRL training workflow #1178

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

fyu-bdai
Copy link
Contributor

@fyu-bdai fyu-bdai commented Oct 8, 2024

Description

Adds TorchRL module wrappers, PPO runner, and PPO runner cfg for training IsaacLab environments with TorchRL.

This PR is the first in a series of three that together, adds a complete training pipeline for the Anymal-D training environment using torchrl. This PR contains core wrapper modules that should be merged first.

Related PRs:
#1179 Adds torchrl play and train scripts.
#1180 Adds Anymal-D torchrl training configuration.

Fixes #1181

⚠️ Not Supported

  • Empirical normalization.
  • Recurrent networks.
  • Training with Neptune logger.

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@fyu-bdai
Copy link
Contributor Author

fyu-bdai commented Oct 8, 2024

Can Vincent Moens also get tagged as a reviewer?

@fyu-bdai fyu-bdai changed the title Add Wrappers for TorchRL training workflow Add wrappers for TorchRL training workflow Oct 8, 2024
@fyu-bdai fyu-bdai mentioned this pull request Oct 8, 2024
6 tasks
@Toni-SM
Copy link
Contributor

Toni-SM commented Oct 8, 2024

In my opinion, non-wrapper code (e.g.: torchrl runner code) should be in the source/standalone/workflows folder.

source/standalone/workflows
├── torchrl
│   ├── ppo/      <-- code here
│   ├── cli_args.py
│   ├── play.py
│   └── train.py

Having the PPO runner code in extension will generate non-Isaac Lab related version changes and changelog records when the code need to be modified.

@fyu-bdai
Copy link
Contributor Author

fyu-bdai commented Oct 8, 2024

In my opinion, non-wrapper code (e.g.: torchrl runner code) should be in the source/standalone/workflows folder.

source/standalone/workflows
├── torchrl
│   ├── ppo/      <-- code here
│   ├── cli_args.py
│   ├── play.py
│   └── train.py

Having the PPO runner code in extension will generate non-Isaac Lab related version changes and changelog records when the code need to be modified.

I realized this causes issues with the environment specific PPO configurations, which needs to import the TorchRL PPO runner cfgs. I can move only the configs to omni.isaac.lab_tasks.utils.wrappers, but that would split the location of the torchrl runner and torchrl runner cfg. I have elected to move them back as they are for now.

@jsmith-bdai
Copy link
Collaborator

jsmith-bdai commented Dec 13, 2024

Hey @fyu-bdai , can you clarify why a single environment doesn't work?

@fyu-bdai
Copy link
Contributor Author

fyu-bdai commented Jan 9, 2025

Hey @fyu-bdai , can you clarify why a single environment doesn't work?

I took another look again and fixed the single env training bug with TorchRL.

Copy link

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very quick review, happy to discuss each of these points

from typing import TYPE_CHECKING

import wandb
from torchrl.data.tensor_specs import CompositeSpec, UnboundedContinuousTensorSpec
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to use Composite and Unbounded - we're deprecating UnboundedContinuousTensorSpec


import wandb
from torchrl.data.tensor_specs import CompositeSpec, UnboundedContinuousTensorSpec
from torchrl.envs.libs.gym import GymEnv
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from torchrl.envs.libs.gym import GymEnv
from torchrl.envs import GymEnv

"""
if self._simple_done:
done = tensordict._get_str("done", default=None)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
done = tensordict._get_str("done", default=None)
done = tensordict.get("done", default=None)

could be a bit more robust to bc-breaking changes (even though I don't expect any). The overhead is low and can be swallowed by compile

@@ -0,0 +1,58 @@
# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +45 to +46
"""Checks the done keys of the input tensordict. Unlike the base GymWrapper implementation, we do not
call env.reset()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, we bumped into a similar problem with @luisenp and @teopir

We need a better handling of auto-resetting envs in torchrl.

How do you get the last values of a trajectory (if you can get them)?


self._curr_ep_len[new_ids] = 0
self._curr_reward_sum[new_ids] = 0
tensordict.set("episode_length", self._ep_len_buf)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too we could use StepCounter

from omni.isaac.lab.envs import ManagerBasedRLEnv


class TorchRLEnvWrapper(GymWrapper):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get around most of the extra features here with torchrl transforms and by telling GymWrapper that the env is auto-resetting - happy to help getting this done.
That will be more robust long term, as this implementation relies on a bunch of private features of torchrl that we could modify in the future.

Comment on lines +302 to +304
# The portion of the code handling cuda streams has been removed in this inherited method, which
# caused CUDA memory allocation issues with IsaacSim during env stepping.
total_frames = self.total_frames
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha I think we can solve this by providing a no_stream arg in the collector - happy to make it a feature if you think that would help

yield tensordict_out.clone()


class ClipPPOLossWrapper(ClipPPOLoss):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on what has been changed here?
There are many versions of PPO and we probably don't cover exactly what you want it to do but that can be fixed!

return td_out


class TrainerWrapper(Trainer):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trainer class hasn't been touched in a long time, as discussed I'm considering subclassing it for the various losses we have starting with PPO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] Add TorchRL to IsaacLab workflows
4 participants