Skip to content

Latest commit

 

History

History
9 lines (7 loc) · 573 Bytes

README.md

File metadata and controls

9 lines (7 loc) · 573 Bytes

marl with composite actions

this small example is meant to illustrate my struggles and attempts at adapting torchrl to multi agent ppo with composite action spaces.

my pain points were:

  • individual log prob keys for the actions cause issues with stacking tensordicts somewhere internally
  • non-natively multivariate distributions need special handling when calculating log probs
  • ppo loss does not deal well with tensordicts, need to extract the sample log prob tensor
  • petting zoo env wrapper does not properly split the action tensors when using dict action spaces