Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PettingZoo AECEnv wrapper causing issues with Batch class #1244

Open
encratite opened this issue Feb 11, 2025 · 6 comments
Open

PettingZoo AECEnv wrapper causing issues with Batch class #1244

encratite opened this issue Feb 11, 2025 · 6 comments

Comments

@encratite
Copy link

I'm trying to do MARL with a PettingZoo environment and Tianshou's PPO implementation but it looks like the PettingZoo AECEnv wrapper is inserting agent_id values that are causing issues with the batch processing. I'm not sure if this is a bug. I might just be misusing some components. The OnpolicyTrainer is throwing the following exception in test_episode:

Image

The offending agent_id properties were likely generated by the PettingZooEnv wrapper:

    def reset(self, *args: Any, **kwargs: Any) -> tuple[dict, dict]:
        self.env.reset(*args, **kwargs)

        observation, reward, terminated, truncated, info = self.env.last(self)

        if isinstance(observation, dict) and "action_mask" in observation:
            observation_dict = {
                "agent_id": self.env.agent_selection,
                "obs": observation["observation"],
                "mask": [obm == 1 for obm in observation["action_mask"]],
            }
        else:
            if isinstance(self.action_space, spaces.Discrete):
                observation_dict = {
                    "agent_id": self.env.agent_selection,
                    "obs": observation,
                    "mask": [True] * self.env.action_space(self.env.agent_selection).n,
                }
            else:
                observation_dict = {
                    "agent_id": self.env.agent_selection,
                    "obs": observation,
                }

        return observation_dict, info

The test_ppo.py sample, which works fine but doesn't support MARL, is similar to my code but uses the CartPole Gymnasium env instead, without the PettingZoo wrapper.

Here's my PettingZoo env:

https://github.com/encratite/thumper/blob/master/environment.py

Here's the Tianshou PPO training code:

https://github.com/encratite/thumper/blob/master/train.py

Any idea what I'm doing wrong? Or is this an actual incompatibility between different Tianshou components?

@yashschandra
Copy link
Contributor

yashschandra commented Feb 11, 2025

You can refer to https://github.com/thu-ml/tianshou/blob/master/test/pettingzoo/tic_tac_toe.py for proper PettingZooEnv usage in case of MARL

@encratite
Copy link
Author

encratite commented Feb 11, 2025

I also checked out that example but it used DQN and my previous reading suggested that I would likely be better off with PPO for my particular use case so I ended up going for PPOPolicy + OnpolicyManager right away. My reference was a sample that used a Gymnasium env rather than a PettingZoo env, though.

I tried to run the specific example you linked but it's actually not compatible with the version of Tianshou I was using (1.1.0 off pip), so I cloned the latest version from GitHub and installed it inside my Conda environment using Poetry.

It worked out of the box, although it printed a huge amount of error messages that heavily slowed down the execution. I got thousands of these:
ep_return should be a scalar but is a numpy array: self._ep_return.shape=(4,). This doesn't make sense for a ReplayBuffer, but currently tests of CachedReplayBuffer requirethis behavior for some reason. Should be fixed ASAP! Returning an array of zeros instead of a scalar zero.

Perhaps I should revert to the version from pip and just use an older version of that sample instead?

Anyway, I inserted my own AECEnv with the PettingZoo wrapper into your recommended sample, I increased the number of agents from 2 to 4, and it actually appears to work.

Is it possible that the issue I'm encountering only occurs when using the PettingZoo wrapper in combination with PPO rather than DQN? As far as I can tell all of the samples in the directory you linked (Pistonball and Tic Tac Toe) are limited to DQN.

Edit: ah, I misread, one of the examples does actually use PPOPolicy but it's continuous rather than discrete. I could try to redirect it to my environment.

@encratite
Copy link
Author

encratite commented Feb 12, 2025

Oh boy, the unmodified PettingZoo PPOPolicy sample from that directory actually fails in a similar location, but with a different error, at least in 1.1.0. Apparently the MultiPolicyManager is expecting obs_next in the forward call, but it's missing:

Image

Even with the latest version off GitHub this still seems to happen. There are definitely several issues here, judging from how even the unmodified tests from the Tianshou repository are failing.

@yashschandra
Copy link
Contributor

Even with the latest version off GitHub this still seems to happen. There are definitely several issues here, judging from how even the unmodified tests from the Tianshou repository are failing.

It seems to be failing because if mask attribute is not present in an observation, then MultiAgentPolicyManager expects next_obs in the batch.

Image

Edit: ah, I misread, one of the examples does actually use PPOPolicy but it's continuous rather than discrete. I could try to redirect it to my environment.

I think if you use a discrete action space (where mask is usually present in every observation) rather than continuous (where mask may not be present), you should be able to run any policy with MultiAgentPolicyManager.

Side note: according to chatgpt -

Image

@encratite
Copy link
Author

Nah, I'm afraid switching to a discrete action space just causes the agent_id error I already described in the first post again. I didn't use my original training code nor my own environment to reproduce this. I used the Tic Tac Toe code with the PettingZoo wrapper and replaced the DQN policy with a PPO policy. Here's the code I used:
https://gist.github.com/encratite/a6353cde2ed85dad580948fc39091cc5

Here's the exception, same place:

Image

@DerDennisOP
Copy link

I'm facing the same problem. Using Discrete Action Space dosn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants