Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Replay Buffer features #994

Closed
matteobettini opened this issue Mar 27, 2023 · 7 comments
Closed

[Feature Request] Replay Buffer features #994

matteobettini opened this issue Mar 27, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request Good first issue A good way to start hacking torchrl!

Comments

@matteobettini
Copy link
Contributor

Hello,

I was playing around with the replay buffers and found some behaviors which resulted surprising for me and was wondering what is the reasoning for this.

  1. when I call buffer.sample(batch_size) it says it is deprected and the correct way is to set the batch_size upon init. But what if I want to change the size of my batch sample at every call?
  2. when i use sampling without replacement I can keep calling buffer.sample() forever, without any error being thrown. How do I know when I'm out of data? what data am I getting when I am out of data?

These are the main two issues I faced in my code.

More generally I would like not to commit at init time to a specific batch_size or replacement stratedy.
I would like at every call to sample() to decide how many sample to receive and if I want to use replacement. and an error to be thrown or a sample of size 0 to be given in case I am out of data.

@matteobettini matteobettini added the enhancement New feature or request label Mar 27, 2023
@matteobettini matteobettini changed the title [Feature Request] [Feature Request] Replay Buffer features Mar 27, 2023
@vmoens
Copy link
Contributor

vmoens commented Mar 27, 2023

when i use sampling without replacement I can keep calling buffer.sample() forever, without any error being thrown. How do I know when I'm out of data? what data am I getting when I am out of data?

if it's with replacement you never are out of data are you? If it's without (SamplerWithoutReplacement) iterating over the replay buffer should end when you're out of data. However you should still be able to sample.
We could think about passing batch_size to overwrite the batch_size of the sampler but here's why we made the change:

  • when using prefetch the only way we can do that is when we know the batch size in advance
  • when using drop_last for the sampler without rep we need to know the batch size in advance too (otherwise one cannot know if the next batch will be bigger than what is left in the buffer)

For your final point i"m open to discuss it as long as we can still iterate over the buffer as we do now, which requires the batch_size to be set in advance.

EDIT: I was peeking at reddit this weekend and the main criticism to torchrl is that it is too complex. I feel that passing the batch-size every time is more complex than not. It's a special use case: in most cases the batch size does not change and I think we should not constrain users to pass it every time. Happy to support the feature you're asking for but not sure it should be the default.

@matteobettini
Copy link
Contributor Author

the replay buffer should end when you're out of data. However you should still be able to sample.

And what will I sample in the case I am out of data, the last valid data before I ran out?

Don't you find it strange to query an empty buffer and get data in your normal batch_size?

  • when using prefetch the only way we can do that is when we know the batch size in advance
  • when using drop_last for the sampler without rep we need to know the batch size in advance too (otherwise one cannot know if the next batch will be bigger than what is left in the buffer)

I see the point about prefetch. But for drop_last, couldn't the sampler just look at how much data is in the buffer and decide based on that on the spot?

With a buffer in my idea we have some stored data. Now, at each time I should be able to take as much as I want and decide if to use replacement or not on the spot. (unless i prefetech things, which will of course not allow me to change my mind)

@vmoens
Copy link
Contributor

vmoens commented Mar 27, 2023

And what will I sample in the case I am out of data, the last valid data before I ran out?

There are two different things:
One is buffer.sample which just gets a batch of samples from the buffer. IDK if it's "strange" that you get something, to me it's more like "give me a bag of stuff from the buffer". Without replacement it's simply "give me a bag of stuff and make sure it hasn't been sampled already since last reset".
If you want to know if it has ran out simply call sampler.ran_out and it'll tell you if you have data left or not.

The other is __iter__. For __iter__ it is more like "go through the buffer and sample until a stopping condition is met". This one should be used if you're "consuming" your buffer.
Now the fact that it's composable allows you to have access to the sampler so we could do something like set_batch_size or whatever else.

But for drop_last, couldn't the sampler just look at how much data is in the buffer and decide based on that on the spot?

No bc if I'm sampling 128 elements now, and next time I'm sampling 1, and there are 64 left, is the next batch the last or not? It isn't, but there is no way for the sampler to tell.

With a buffer in my idea we have some stored data. Now, at each time I should be able to take as much as I want and decide if to use replacement or not on the spot. (unless i prefetech things, which will of course not allow me to change my mind)

note that you can share the storage among multiple buffers at no cost, not sure if that would cover what you have in mind. But prefetch is a valuable feature and it requires a certain degree of predictability. Also, the composability of the buffer makes it so that if you want a sampler that can have a drop_last, and sample without replacement "on the spot", and do the opposite, then I'm having a hard time understanding what sort of sampler can sample both with and without replacement and what that means in practice...

Final point: the goal of having a composable replay buffer is that if you have an unusual use case in mind you can code your own sampler / storage/ writer. We don't claim we support every single use case, but we certainly claim we want to enable you to do whatever you want without having to re write the whole buffer class!

@matteobettini
Copy link
Contributor Author

I see, thanks for all that info it clears up things.

The thing that confused me is

 buffer = ReplayBuffer(
      storage=LazyTensorStorage(frames_per_batch),
      sampler=SamplerWithoutReplacement(),
      batch_size=frames_per_batch
  )
buffer.extend(my_data_of_size_frames_per_batch)

buffer.sample() # I get my data
buffer.sample() # doesn't throw an error and what do I get?

Without replacement it's simply "give me a bag of stuff and make sure it hasn't been sampled already since last reset".

The second bag I got had already been sampled and I did not reset anything

@vmoens
Copy link
Contributor

vmoens commented Mar 27, 2023

nope bc it resets automatically, unlike the iterator.

In practice we could raise an exception when the buffer is empty, happy to see that feature.

@vmoens vmoens added the Good first issue A good way to start hacking torchrl! label Mar 27, 2023
@matteobettini
Copy link
Contributor Author

yep it is just that automatic reset which I didn't know about.

Maybe the sampler without replacement could have a parameter called auto_reset?

@vmoens
Copy link
Contributor

vmoens commented Mar 30, 2023

Solved by #1003

@vmoens vmoens closed this as completed Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Good first issue A good way to start hacking torchrl!
Projects
None yet
Development

No branches or pull requests

2 participants