-
Notifications
You must be signed in to change notification settings - Fork 113
Conversation
} | ||
|
||
int nle_done2() { | ||
pline("You can't quit now, you're having so much fun!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love these messages
nle/agent/core/vtrace.py
Outdated
vs_t_plus_1 = torch.cat( | ||
[vs[1:], broadcasted_bootstrap_values.unsqueeze(0)], dim=0 | ||
) | ||
vs_t_plus_1 = torch.cat([vs[1:], torch.unsqueeze(bootstrap_value, 0)], dim=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale for changing this file? Are you sure the broadcasting isn't needed?
Just wondering as this file is not original, but copied from elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 128 in 0424d29
vs_t_plus_1 = torch.cat([vs[1:], torch.unsqueeze(bootstrap_value, 0)], dim=0) |
Pulled straight out of our neurips2020release
branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave this file as-is.
nle/scripts/play.py
Outdated
# print('--------') | ||
# env.render(render_mode) | ||
# print('--------') | ||
# print('\033[31A') # Go up 31 lines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these lines supposed to be commented out? (also the ones below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will adjust these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Although our CI is still down (and should be put up again asap) can confirm the code is all blacked, tests all pass, and the only flake8 errors we have are line too long in baseline model... (aren't these fine?) |
Changes:
Needless to say I could squash restructure these commits, and maybe should for quick reversion. |
nle/env/base.py
Outdated
@@ -149,6 +149,22 @@ | |||
"\x1b[95m", | |||
"\x1b[96m", | |||
"\x1b[97m", | |||
"\033[30m", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this? Why do we have this constant?
nle/env/base.py
Outdated
observation, done, exceptions=True | ||
) | ||
if ( | ||
not self._allow_all_modes or observation[self._program_state_index][0] == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do? Doesn't seem very readable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checks if death the game is over. will improve clarity
nle/scripts/play.py
Outdated
env.render(render_mode) | ||
print("--------") | ||
if not print_all_frames: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a bit of a weird name for what it does (going back to override previous frames)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are not printing all frames, then you are going back and overwriting instead. i wanted a store_true
parameter that would stop the overwriting behaviour. howabout print_frames_separately
?
nle/scripts/ttyplay.py
Outdated
@@ -10,6 +10,7 @@ | |||
import struct | |||
import termios | |||
import time | |||
from nle.nethack.actions import _ACTIONS_DICT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this as it makes ttyplay.py nethack-only all of a sudden. Better to add another script and import the relevant functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean nle
only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean now you cant just take the script and play any old ttyrec, away from nle?
nle/tests/test_envs.py
Outdated
@@ -365,7 +365,7 @@ def test_final_reward(self, env): | |||
|
|||
# Hack to quit. | |||
env.env.step(nethack.M("q")) | |||
_, reward, done, _ = env.step(env._actions.index(ord("y"))) | |||
obs, reward, done, _ = env.step(env._actions.index(ord("y"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was undoing a previous change to the tests a few commits back, and slipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obs, reward, done, _ = env.step(env._actions.index(ord("y"))) | |
_, reward, done, _ = env.step(env._actions.index(ord("y"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This polybeast version here truly is a beast, in the bad sense of the word. I'm not convinced hydra is adding anything here either, but I understand this is just a copy&paste from the agent repo.
I can see how this might not be the easiest setup to start experimenting.
cdaa283
to
f8d1ebe
Compare
I have rebased and restructured into 3 commits. I'm increasingly tempted to cut out the agent and move it to a branch - am happy to do this if we think this is sensible. |
This environment is derived from NetHackScore with some key differences: * starting character is '@' (random) by default * `perform_known_steps` is only ever called on end, meaning that menus, yes no questions and text input are all open again * the action space is as full as possible Note in this change is included the slight modification of the actions enums to prevent dangeours actions being called, and to expand to allow numbers and +/- signs. Also saving and option changing have been disabled.
actions to new file ttyplay2.py. Also the play file now overwrites frames instead of printing all frames separately.
f8d1ebe
to
50ccf04
Compare
All tests passed so rebased to include latest changes. |
I think moving the agent to a branch is a good idea. |
50ccf04
to
cf9aebf
Compare
The agent code now lives at #163. When this is merged, I will rebase that one off merge so the versions all make sense (currently the "challenge" environment doesnt exist in that branch. |
* `tty_colours` runs 0-31 (not 0-32) * `inv_strs` runs 0-255 (not 0-127) as you can name your inventory items with meta + keypress.
(Also a quick drive by fix for the observation space bug spotted on Friday) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good to me.
HELP = ord("?") # give a help message | ||
PREVMSG = C("p") # view recent game messages | ||
|
||
|
||
class Command(enum.IntEnum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this was "all commands" (or something), and the "RL relevant ones" we gathered in their own enum. What's the philosophy now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enum wasn't really all commands because it didnt contain movement etc? I figured if we were splitting out lets at least clearly mark the very dangerous ones.
no_render, | ||
render_mode, | ||
print_frames_separately, | ||
**kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we use the **kwargs for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the debug
flag is created but does nothing in this function because it has been used previously. however this fn is called with play(**vars(flags))
, so it needs a home here. the home i thought would be clearer in **kwargs
, but i almost put **_
@@ -126,6 +135,12 @@ def process(f): | |||
continue | |||
|
|||
if channel == 1: # Input channel. | |||
os.write( | |||
1, b"\033[s\033[26;0f\033[37;1mFrame %d:\033[0m " % frame | |||
) # Save Cursor & Jump to L26 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this interfere with other saved cursors from the data stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick google suggests, yes you can only save one cursor at a time, so yes this would interfere. On test rendering i've not seen any problems though for our ttyrec2s...
This is the branch that contains the Challenge Environment and the NeurIPS baseline we will release for it.
The key differences from
NetHackChallenge
toNetHackScore
are:characeter = '@'
)PREV_MSG
andHISTORY
actions are removed (they seemed to be causing crashes)The model itself we submit as baseline is based heavily on the model in Neurips2020 paper, and posted on the branch here (
neurips2020release
). Here we have taken the best model and cut out all the other elements of the model, to keep the baseline simple, and to try to keep the model to only one file.The results of this run (for rotating characters and single characters are visible here ).
My suspicion is the model has gotten a bit slower (6000 sps) so there is probably room for a speed up. We can do this after the competition has started.