SIL #158

qgallouedec · 2023-02-26T08:58:33Z

Self Imitation Learning
@emrul has implemented SAIL, see #139 (comment)

@emrul, is there an official implementation for those two? Do you match the results from the paper with your implementation?

emrul · 2023-02-26T21:46:22Z

Hi @qgallouedec - I haven't don't much testing but if there's no rush I'd love to work on this in my spare time. The official implementation appears to be here: https://github.com/google-research/google-research/tree/master/sail_rl

qgallouedec · 2023-02-27T08:25:13Z

There is no rush at all :)

richardjozsa · 2023-02-28T13:40:28Z

Hey everyone,

I have tried the code what @emrul pasted in the IQN PR comments, it works.

One thing what I haven't got to work is the SubProcEenv wrapping. Just wanted to let you know. :)

emrul · 2023-02-28T14:06:01Z

Thanks @richardjozsa - that's interesting because I exclusively use SubProcVecEnv for training and the Dummy vec env for evaluation. What happens when you use SubProcVecEnv?

richardjozsa · 2023-02-28T14:14:00Z

This is the error what I got, but if it works for you than I recheck. I use a customenv maybe that caused something.

Traceback (most recent call last):
RLTEST | File "/usr/lib/python3.10/multiprocessing/forkserver.py", line 274, in main
RLTEST | code = _serve_one(child_r, fds,
RLTEST | File "/usr/lib/python3.10/multiprocessing/forkserver.py", line 313, in _serve_one
RLTEST | code = spawn._main(child_r, parent_sentinel)
RLTEST | File "/usr/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
RLTEST | self = reduction.pickle.load(from_parent)
RLTEST | File "/home/ftuser/.local/lib/python3.10/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 375, in setstate
RLTEST | self.var = cloudpickle.loads(var)
RLTEST | ModuleNotFoundError: No module named 'base'

emrul · 2023-02-28T14:19:23Z

... looks like an error trying to load your env from Pickle but in my modifications I don't make any changes to envs (the replay buffer holds the SAIL returns internally) so I don't think this should be caused by amendments.

richardjozsa · 2023-02-28T15:29:58Z

My bad sorry, it was in my environment, it works fine. Only comment, you have set the replay buffer to device= cpu. I guess that can be auto. :)

emrul · 2023-02-28T15:45:28Z

My bad sorry, it was in my environment, it works fine. Only comment, you have set the replay buffer to device= cpu. I guess that can be auto. :)

Great, and yes - good catch on the device, I will correct that!

qgallouedec added the enhancement New feature or request label Feb 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIL #158

SIL #158

qgallouedec commented Feb 26, 2023

emrul commented Feb 26, 2023

qgallouedec commented Feb 27, 2023

richardjozsa commented Feb 28, 2023

emrul commented Feb 28, 2023

richardjozsa commented Feb 28, 2023 •

edited

Loading

emrul commented Feb 28, 2023

richardjozsa commented Feb 28, 2023

emrul commented Feb 28, 2023

SIL #158

SIL #158

Comments

qgallouedec commented Feb 26, 2023

emrul commented Feb 26, 2023

qgallouedec commented Feb 27, 2023

richardjozsa commented Feb 28, 2023

emrul commented Feb 28, 2023

richardjozsa commented Feb 28, 2023 • edited Loading

emrul commented Feb 28, 2023

richardjozsa commented Feb 28, 2023

emrul commented Feb 28, 2023

richardjozsa commented Feb 28, 2023 •

edited

Loading