-
-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CartPole] Add sutton_barto_reward
argument
#958
Conversation
@pseudo-rnd-thoughts |
Why do we want this to be a v2, as opposed to just adding the optional argument to the constructor and letting users use it? And in any case, please write the reward computation a bit more explicitly, converting booleans to floats and manipulating them is clever and all that, but unreadable and annoying to maintain in the future. A ternary operator would probably work nicely here. I didn't participate in the original issue, and I'm overall not convinced this is necessary since both rewards are perfectly valid (albeit a bit different), but I won't oppose adding it as an optional setting. However, if we make it the v2 version, there are two problems:
|
As far as I can tell, new arguments have not been added mid-environment version Also I believe that the previous version was technical incorrect (even if it worked.) |
Environment versions have been used to indicate that a bug in the previous version has been fixed, which would change the dynamic, so users should be alerted to this change. Therefore, as this is an addition that can't affect previous users, this should not be a version change, particularly as a feature is turned off by default |
sutton_barto_reward
argument
Now it simply adds the new argument. I have redone the pull request. And does not bump the environment version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the two comments
I cannot understand why the testing is failing. |
I realized after merging that we need to update the vector and jax versions also what is the vectorized version's reward
|
Description
Adds
sutton_barto_reward
environment parameter to CartPole to support the reward function in Sutton and Barto, Reinforcement Learning an introduction. This parameter is turned off by default.Fixes #790
Type of change
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)