PettingZoo AECEnv wrapper causing issues with Batch class #1244

encratite · 2025-02-11T14:11:09Z

I'm trying to do MARL with a PettingZoo environment and Tianshou's PPO implementation but it looks like the PettingZoo AECEnv wrapper is inserting agent_id values that are causing issues with the batch processing. I'm not sure if this is a bug. I might just be misusing some components. The OnpolicyTrainer is throwing the following exception in test_episode:

The offending agent_id properties were likely generated by the PettingZooEnv wrapper:

    def reset(self, *args: Any, **kwargs: Any) -> tuple[dict, dict]:
        self.env.reset(*args, **kwargs)

        observation, reward, terminated, truncated, info = self.env.last(self)

        if isinstance(observation, dict) and "action_mask" in observation:
            observation_dict = {
                "agent_id": self.env.agent_selection,
                "obs": observation["observation"],
                "mask": [obm == 1 for obm in observation["action_mask"]],
            }
        else:
            if isinstance(self.action_space, spaces.Discrete):
                observation_dict = {
                    "agent_id": self.env.agent_selection,
                    "obs": observation,
                    "mask": [True] * self.env.action_space(self.env.agent_selection).n,
                }
            else:
                observation_dict = {
                    "agent_id": self.env.agent_selection,
                    "obs": observation,
                }

        return observation_dict, info

The test_ppo.py sample, which works fine but doesn't support MARL, is similar to my code but uses the CartPole Gymnasium env instead, without the PettingZoo wrapper.

Here's my PettingZoo env:

https://github.com/encratite/thumper/blob/master/environment.py

Here's the Tianshou PPO training code:

https://github.com/encratite/thumper/blob/master/train.py

Any idea what I'm doing wrong? Or is this an actual incompatibility between different Tianshou components?

yashschandra · 2025-02-11T21:45:31Z

You can refer to https://github.com/thu-ml/tianshou/blob/master/test/pettingzoo/tic_tac_toe.py for proper PettingZooEnv usage in case of MARL

encratite · 2025-02-11T22:33:51Z

I also checked out that example but it used DQN and my previous reading suggested that I would likely be better off with PPO for my particular use case so I ended up going for PPOPolicy + OnpolicyManager right away. My reference was a sample that used a Gymnasium env rather than a PettingZoo env, though.

I tried to run the specific example you linked but it's actually not compatible with the version of Tianshou I was using (1.1.0 off pip), so I cloned the latest version from GitHub and installed it inside my Conda environment using Poetry.

It worked out of the box, although it printed a huge amount of error messages that heavily slowed down the execution. I got thousands of these:
ep_return should be a scalar but is a numpy array: self._ep_return.shape=(4,). This doesn't make sense for a ReplayBuffer, but currently tests of CachedReplayBuffer requirethis behavior for some reason. Should be fixed ASAP! Returning an array of zeros instead of a scalar zero.

Perhaps I should revert to the version from pip and just use an older version of that sample instead?

Anyway, I inserted my own AECEnv with the PettingZoo wrapper into your recommended sample, I increased the number of agents from 2 to 4, and it actually appears to work.

Is it possible that the issue I'm encountering only occurs when using the PettingZoo wrapper in combination with PPO rather than DQN? As far as I can tell all of the samples in the directory you linked (Pistonball and Tic Tac Toe) are limited to DQN.

Edit: ah, I misread, one of the examples does actually use PPOPolicy but it's continuous rather than discrete. I could try to redirect it to my environment.

encratite · 2025-02-12T01:11:36Z

Oh boy, the unmodified PettingZoo PPOPolicy sample from that directory actually fails in a similar location, but with a different error, at least in 1.1.0. Apparently the MultiPolicyManager is expecting obs_next in the forward call, but it's missing:

Even with the latest version off GitHub this still seems to happen. There are definitely several issues here, judging from how even the unmodified tests from the Tianshou repository are failing.

yashschandra · 2025-02-12T22:16:44Z

Even with the latest version off GitHub this still seems to happen. There are definitely several issues here, judging from how even the unmodified tests from the Tianshou repository are failing.

It seems to be failing because if mask attribute is not present in an observation, then MultiAgentPolicyManager expects next_obs in the batch.

Edit: ah, I misread, one of the examples does actually use PPOPolicy but it's continuous rather than discrete. I could try to redirect it to my environment.

I think if you use a discrete action space (where mask is usually present in every observation) rather than continuous (where mask may not be present), you should be able to run any policy with MultiAgentPolicyManager.

Side note: according to chatgpt -

encratite · 2025-02-13T01:36:07Z

Nah, I'm afraid switching to a discrete action space just causes the agent_id error I already described in the first post again. I didn't use my original training code nor my own environment to reproduce this. I used the Tic Tac Toe code with the PettingZoo wrapper and replaced the DQN policy with a PPO policy. Here's the code I used:
https://gist.github.com/encratite/a6353cde2ed85dad580948fc39091cc5

Here's the exception, same place:

DerDennisOP · 2025-02-23T23:00:48Z

I'm facing the same problem. Using Discrete Action Space dosn't work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PettingZoo AECEnv wrapper causing issues with Batch class #1244

PettingZoo AECEnv wrapper causing issues with Batch class #1244

encratite commented Feb 11, 2025

yashschandra commented Feb 11, 2025 •

edited

Loading

encratite commented Feb 11, 2025 •

edited

Loading

encratite commented Feb 12, 2025 •

edited

Loading

yashschandra commented Feb 12, 2025

encratite commented Feb 13, 2025

DerDennisOP commented Feb 23, 2025

PettingZoo AECEnv wrapper causing issues with Batch class #1244

PettingZoo AECEnv wrapper causing issues with Batch class #1244

Comments

encratite commented Feb 11, 2025

yashschandra commented Feb 11, 2025 • edited Loading

encratite commented Feb 11, 2025 • edited Loading

encratite commented Feb 12, 2025 • edited Loading

yashschandra commented Feb 12, 2025

encratite commented Feb 13, 2025

DerDennisOP commented Feb 23, 2025

yashschandra commented Feb 11, 2025 •

edited

Loading

encratite commented Feb 11, 2025 •

edited

Loading

encratite commented Feb 12, 2025 •

edited

Loading