marl with composite actions

this small example is meant to illustrate my struggles and attempts at adapting torchrl to multi agent ppo with composite action spaces.

my pain points were:

individual log prob keys for the actions cause issues with stacking tensordicts somewhere internally
non-natively multivariate distributions need special handling when calculating log probs
ppo loss does not deal well with tensordicts, need to extract the sample log prob tensor
petting zoo env wrapper does not properly split the action tensors when using dict action spaces

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
src/rltest		src/rltest
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback