[Feature Request] Replay Buffer features #994

matteobettini · 2023-03-27T13:29:27Z

Hello,

I was playing around with the replay buffers and found some behaviors which resulted surprising for me and was wondering what is the reasoning for this.

when I call buffer.sample(batch_size) it says it is deprected and the correct way is to set the batch_size upon init. But what if I want to change the size of my batch sample at every call?
when i use sampling without replacement I can keep calling buffer.sample() forever, without any error being thrown. How do I know when I'm out of data? what data am I getting when I am out of data?

These are the main two issues I faced in my code.

More generally I would like not to commit at init time to a specific batch_size or replacement stratedy.
I would like at every call to sample() to decide how many sample to receive and if I want to use replacement. and an error to be thrown or a sample of size 0 to be given in case I am out of data.

The text was updated successfully, but these errors were encountered:

vmoens · 2023-03-27T13:41:40Z

when i use sampling without replacement I can keep calling buffer.sample() forever, without any error being thrown. How do I know when I'm out of data? what data am I getting when I am out of data?

if it's with replacement you never are out of data are you? If it's without (SamplerWithoutReplacement) iterating over the replay buffer should end when you're out of data. However you should still be able to sample.
We could think about passing batch_size to overwrite the batch_size of the sampler but here's why we made the change:

when using prefetch the only way we can do that is when we know the batch size in advance
when using drop_last for the sampler without rep we need to know the batch size in advance too (otherwise one cannot know if the next batch will be bigger than what is left in the buffer)

For your final point i"m open to discuss it as long as we can still iterate over the buffer as we do now, which requires the batch_size to be set in advance.

EDIT: I was peeking at reddit this weekend and the main criticism to torchrl is that it is too complex. I feel that passing the batch-size every time is more complex than not. It's a special use case: in most cases the batch size does not change and I think we should not constrain users to pass it every time. Happy to support the feature you're asking for but not sure it should be the default.

matteobettini · 2023-03-27T13:53:35Z

the replay buffer should end when you're out of data. However you should still be able to sample.

And what will I sample in the case I am out of data, the last valid data before I ran out?

Don't you find it strange to query an empty buffer and get data in your normal batch_size?

when using prefetch the only way we can do that is when we know the batch size in advance

when using drop_last for the sampler without rep we need to know the batch size in advance too (otherwise one cannot know if the next batch will be bigger than what is left in the buffer)

I see the point about prefetch. But for drop_last, couldn't the sampler just look at how much data is in the buffer and decide based on that on the spot?

With a buffer in my idea we have some stored data. Now, at each time I should be able to take as much as I want and decide if to use replacement or not on the spot. (unless i prefetech things, which will of course not allow me to change my mind)

vmoens · 2023-03-27T14:03:31Z

And what will I sample in the case I am out of data, the last valid data before I ran out?

There are two different things:
One is buffer.sample which just gets a batch of samples from the buffer. IDK if it's "strange" that you get something, to me it's more like "give me a bag of stuff from the buffer". Without replacement it's simply "give me a bag of stuff and make sure it hasn't been sampled already since last reset".
If you want to know if it has ran out simply call sampler.ran_out and it'll tell you if you have data left or not.

The other is __iter__. For __iter__ it is more like "go through the buffer and sample until a stopping condition is met". This one should be used if you're "consuming" your buffer.
Now the fact that it's composable allows you to have access to the sampler so we could do something like set_batch_size or whatever else.

But for drop_last, couldn't the sampler just look at how much data is in the buffer and decide based on that on the spot?

No bc if I'm sampling 128 elements now, and next time I'm sampling 1, and there are 64 left, is the next batch the last or not? It isn't, but there is no way for the sampler to tell.

With a buffer in my idea we have some stored data. Now, at each time I should be able to take as much as I want and decide if to use replacement or not on the spot. (unless i prefetech things, which will of course not allow me to change my mind)

note that you can share the storage among multiple buffers at no cost, not sure if that would cover what you have in mind. But prefetch is a valuable feature and it requires a certain degree of predictability. Also, the composability of the buffer makes it so that if you want a sampler that can have a drop_last, and sample without replacement "on the spot", and do the opposite, then I'm having a hard time understanding what sort of sampler can sample both with and without replacement and what that means in practice...

Final point: the goal of having a composable replay buffer is that if you have an unusual use case in mind you can code your own sampler / storage/ writer. We don't claim we support every single use case, but we certainly claim we want to enable you to do whatever you want without having to re write the whole buffer class!

matteobettini · 2023-03-27T14:50:03Z

I see, thanks for all that info it clears up things.

The thing that confused me is

 buffer = ReplayBuffer(
      storage=LazyTensorStorage(frames_per_batch),
      sampler=SamplerWithoutReplacement(),
      batch_size=frames_per_batch
  )
buffer.extend(my_data_of_size_frames_per_batch)

buffer.sample() # I get my data
buffer.sample() # doesn't throw an error and what do I get?

Without replacement it's simply "give me a bag of stuff and make sure it hasn't been sampled already since last reset".

The second bag I got had already been sampled and I did not reset anything

vmoens · 2023-03-27T14:59:31Z

nope bc it resets automatically, unlike the iterator.

In practice we could raise an exception when the buffer is empty, happy to see that feature.

matteobettini · 2023-03-27T15:01:15Z

yep it is just that automatic reset which I didn't know about.

Maybe the sampler without replacement could have a parameter called auto_reset?

vmoens · 2023-03-30T10:46:32Z

Solved by #1003

matteobettini added the enhancement New feature or request label Mar 27, 2023

matteobettini assigned vmoens Mar 27, 2023

matteobettini changed the title ~~[Feature Request]~~ [Feature Request] Replay Buffer features Mar 27, 2023

vmoens added the Good first issue A good way to start hacking torchrl! label Mar 27, 2023

vmoens mentioned this issue Mar 27, 2023

[DO NOT CLOSE] Call for contributions #509

Open

36 tasks

vmoens closed this as completed Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Replay Buffer features #994

[Feature Request] Replay Buffer features #994

matteobettini commented Mar 27, 2023

vmoens commented Mar 27, 2023 •

edited

Loading

matteobettini commented Mar 27, 2023

vmoens commented Mar 27, 2023

matteobettini commented Mar 27, 2023

vmoens commented Mar 27, 2023

matteobettini commented Mar 27, 2023

vmoens commented Mar 30, 2023 •

edited

Loading

[Feature Request] Replay Buffer features #994

[Feature Request] Replay Buffer features #994

Comments

matteobettini commented Mar 27, 2023

vmoens commented Mar 27, 2023 • edited Loading

matteobettini commented Mar 27, 2023

vmoens commented Mar 27, 2023

matteobettini commented Mar 27, 2023

vmoens commented Mar 27, 2023

matteobettini commented Mar 27, 2023

vmoens commented Mar 30, 2023 • edited Loading

vmoens commented Mar 27, 2023 •

edited

Loading

vmoens commented Mar 30, 2023 •

edited

Loading