[BUG] ParallelEnv
handling of done flag when wrapped envs have non-empty batch_size
#776
Labels
bug
Something isn't working
Describe the bug
Currently in
ParallelEnv
the method_run_worker_pipe_shared_mem
has the following line to check if the environment is done:...
The problem is that
_td.get("done")
is a tensor of shape(*wrapped_env.batch_dim,1)
and thus the if is undefined for a tensor with multiple values.Furthermore, the
ParallelEnv
uses multiple times the funtionsall()
andany()
without specified dimensions. This means that it will span over all dimensions (including the environment ones which could be arbitraily many) this has the risk of injecting bias and bugs. This is done for example in the_reset()
function:To Reproduce
Add the instructions
parallel_env.rand_step()
to the test introduced in #774Reason and Possible fixes
The decision on how to handle the done flag in environemnts with non-empty batch sizes is not so trivial.
My suggestion is to remove this logic from the function that is throwing the error and give the done vector to the user as is.
In case that checks have to be done on the done vector we have to keep in mind its arbitrary shape.
I suggest to remove all the logic checks which use
all()
andany()
on all dimensions such asThis is because users might need to leave some dims done and reset others.
In favor of this you can check more cleverly that only the dimensions which where reset are actually not done
The text was updated successfully, but these errors were encountered: