-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PettingZoo AECEnv wrapper causing issues with Batch class #1244
Comments
You can refer to https://github.com/thu-ml/tianshou/blob/master/test/pettingzoo/tic_tac_toe.py for proper |
I also checked out that example but it used DQN and my previous reading suggested that I would likely be better off with PPO for my particular use case so I ended up going for PPOPolicy + OnpolicyManager right away. My reference was a sample that used a Gymnasium env rather than a PettingZoo env, though. I tried to run the specific example you linked but it's actually not compatible with the version of Tianshou I was using (1.1.0 off pip), so I cloned the latest version from GitHub and installed it inside my Conda environment using Poetry. It worked out of the box, although it printed a huge amount of error messages that heavily slowed down the execution. I got thousands of these: Perhaps I should revert to the version from pip and just use an older version of that sample instead? Anyway, I inserted my own AECEnv with the PettingZoo wrapper into your recommended sample, I increased the number of agents from 2 to 4, and it actually appears to work. Is it possible that the issue I'm encountering only occurs when using the PettingZoo wrapper in combination with PPO rather than DQN? As far as I can tell all of the samples in the directory you linked (Pistonball and Tic Tac Toe) are limited to DQN. Edit: ah, I misread, one of the examples does actually use PPOPolicy but it's continuous rather than discrete. I could try to redirect it to my environment. |
Oh boy, the unmodified PettingZoo PPOPolicy sample from that directory actually fails in a similar location, but with a different error, at least in 1.1.0. Apparently the MultiPolicyManager is expecting obs_next in the forward call, but it's missing: Even with the latest version off GitHub this still seems to happen. There are definitely several issues here, judging from how even the unmodified tests from the Tianshou repository are failing. |
Nah, I'm afraid switching to a discrete action space just causes the agent_id error I already described in the first post again. I didn't use my original training code nor my own environment to reproduce this. I used the Tic Tac Toe code with the PettingZoo wrapper and replaced the DQN policy with a PPO policy. Here's the code I used: Here's the exception, same place: |
I'm facing the same problem. Using Discrete Action Space dosn't work. |
I'm trying to do MARL with a PettingZoo environment and Tianshou's PPO implementation but it looks like the PettingZoo AECEnv wrapper is inserting agent_id values that are causing issues with the batch processing. I'm not sure if this is a bug. I might just be misusing some components. The OnpolicyTrainer is throwing the following exception in test_episode:
The offending agent_id properties were likely generated by the PettingZooEnv wrapper:
The test_ppo.py sample, which works fine but doesn't support MARL, is similar to my code but uses the CartPole Gymnasium env instead, without the PettingZoo wrapper.
Here's my PettingZoo env:
https://github.com/encratite/thumper/blob/master/environment.py
Here's the Tianshou PPO training code:
https://github.com/encratite/thumper/blob/master/train.py
Any idea what I'm doing wrong? Or is this an actual incompatibility between different Tianshou components?
The text was updated successfully, but these errors were encountered: