Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Bug in HERReplayBuffer. #811

Closed
4 of 8 tasks
sunkafei opened this issue Feb 27, 2023 · 8 comments · Fixed by #817
Closed
4 of 8 tasks

A Bug in HERReplayBuffer. #811

sunkafei opened this issue Feb 27, 2023 · 8 comments · Fixed by #817
Labels
bug Something isn't working

Comments

@sunkafei
Copy link
Contributor

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, gym, torch, numpy, sys
    print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)

Hello, I have some questions in the code of rewrite_transitions in HERReplayBuffer:

    def rewrite_transitions(self, indices: np.ndarray) -> None:
        """Re-write the goal of some sampled transitions' episodes according to HER.
        Currently applies only HER's 'future' strategy. The new goals will be written \
        directly to the internal batch data temporarily and will be restored right \
        before the next sampling or when using some of the buffer's method (e.g. \
        `add`, `save_hdf5`, etc.). This is to make sure that n-step returns \
        calculation etc., performs correctly without additional alteration.
        """
        if indices.size == 0:
            return

        # Sort indices keeping chronological order
        indices[indices < self._index] += self.maxsize
        indices = np.sort(indices)
        indices[indices >= self.maxsize] -= self.maxsize

        # Construct episode trajectories
        indices = [indices]
        for _ in range(self.horizon - 1):
            indices.append(self.next(indices[-1]))
        indices = np.stack(indices)

        # Calculate future timestep to use
        current = indices[0]
        terminal = indices[-1]
        future_offset = np.random.uniform(size=len(indices[0])) * (terminal - current)
        future_offset = future_offset.astype(int)
        future_t = (current + future_offset)
        ...

As ReplayBuffer is implemented as a circular queue, the indices in terminal may be less than the corresponding indices in current. So that some elements in future_offset may be negative, which will make the states in future_t not desired future states.
@Juno-T

@Juno-T
Copy link
Contributor

Juno-T commented Feb 27, 2023

Thanks for the question. I think this case was covered by the first line, indices[indices < self._index] += self.maxsize, where the indices that are wrapped back by the circular queue get incremented by self.maxsize making them chronological ordered again.

However, if you have some codes illustrating the bug, please feel free to share them so that we can investigate further.

@Juno-T
Copy link
Contributor

Juno-T commented Feb 27, 2023

For reference, there is a test case for testing cycled indices for HERReplayBuffer in test_buffer.py.

@sunkafei
Copy link
Contributor Author

@Juno-T Thanks for your reply. I found that the variable future_t was used in the following 2 places:

        if self._save_obs_next:
            ep_obs_next = self[unique_ep_indices].obs_next
            future_obs = self[future_t[unique_ep_close_indices]].obs_next
        else:
            future_obs = self[self.next(future_t[unique_ep_close_indices])].obs

As unique_ep_close_indices represents the last indices in every episode, why not replace future_t with terminal and remove the calculation of future_t from the code?

@Juno-T
Copy link
Contributor

Juno-T commented Feb 28, 2023

So according to the paper, they found that instead of rewriting the goal by the terminal state's achieved goal, it is equally good or better to use random future state's achieved goal (see fig.6).

If you look at the future_t's calculation, you can see that it contains the indices of a random future state vary by episodes.

I think we should label this issue as question, @Trinkle23897.
Edit: it is indeed a bug.

@sunkafei
Copy link
Contributor Author

@Juno-T I think I understand what you mean, but the calculation of future_t may be wrong. Here is an example to reproduce the error.

import numpy as np
from tianshou.data import Batch
from tianshou.data import HERReplayBuffer
from env import MyGoalEnv  # from test.base.env import MyGoalEnv
def test_herreplaybuffer():
    env_size = 99
    bufsize = 15
    env = MyGoalEnv(env_size, array_state=False)
    def compute_reward_fn(ag, g):
        return env.compute_reward_fn(ag, g, {})
    buf = HERReplayBuffer(bufsize, compute_reward_fn=compute_reward_fn, horizon=30, future_k=8)
    buf.future_p = 1
    
    for x, ep_len in enumerate([10, 20]):
        obs, _ = env.reset()
        for i in range(ep_len):
            act = 1
            obs_next, rew, terminated, truncated, info = env.step(act)
            batch = Batch(
                obs=obs,
                act=[act],
                rew=rew,
                terminated=(i == ep_len - 1),
                truncated=(i == ep_len - 1),
                obs_next=obs_next,
                info=info
            )
            if x == 1 and obs["observation"] < 10:
                obs = obs_next
                continue
            buf.add(batch)
            obs = obs_next
    # The above code generate two episodes. States numbered from 0 to 9 for the first episode. States numbered from 10 to 19 for the second episode.
    buf._restore_cache()
    sample_indices = np.array([10])  # Suppose the sampled indices is [10]
    buf.rewrite_transitions(sample_indices)
    print(buf.obs)
    # We can see that the desired_goal for 10 may be 6,7,8,9,10,11. But 6,7,8,9 are the states in episode 1.

if __name__ == '__main__':
    test_herreplaybuffer()

In the above code, the desired_goal may come from other eposides!

@sunkafei
Copy link
Contributor Author

sunkafei commented Feb 28, 2023

@Juno-T If change the code to sample_indices = np.array([9]), the desired_goal is 10.

Here is an explanation for my code:

  • The states for the first episode are in [0, 9), the states for the second episode are in [10, 19)
  • The bufsize is 15, so the initial observations are [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
  • After the first episode, the observations are [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0]
  • As the second episode starts from 10, the observations after the second episode are [15, 16, 17, 18, 19, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
  • So that the 11th observation is 10, which is the first state for the second episode.

@Juno-T
Copy link
Contributor

Juno-T commented Feb 28, 2023

Thank you for the clarification, I have confirmed the bug locally and it is indeed a bug in future_t calculation. The terminal indices can be lower than the current indices. This make the future_offset to be negative and the future_t will take timestep from previous episodes.

I think this should fix it:

def rewrite_transitions(self, indices: np.ndarray) -> None:

        ...

        # Calculate future timestep to use
        current = indices[0]
        terminal = indices[-1]
        episodes_len = (terminal - current + self.maxsize) % self.maxsize
        future_offset = np.random.uniform(size=len(indices[0])) * (episodes_len)
        future_offset = future_offset.astype(int)
        future_t = (current + future_offset) % self.maxsize
        
        ...

Would you like to create a pull request to fix this bug? You can add your test case to the test_buffer.py and also verfiy some edge cases.

@sunkafei
Copy link
Contributor Author

@Juno-T Thanks for your reply. I will make a pull request when I'm free.

@Trinkle23897 Trinkle23897 added the bug Something isn't working label Feb 28, 2023
sunkafei added a commit to sunkafei/tianshou that referenced this issue Feb 28, 2023
sunkafei added a commit to sunkafei/tianshou that referenced this issue Mar 1, 2023
@Trinkle23897 Trinkle23897 linked a pull request Mar 4, 2023 that will close this issue
9 tasks
BFAnas pushed a commit to BFAnas/tianshou that referenced this issue May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants