Eric/competition #156

cdmatters · 2021-05-25T10:35:21Z

This is the branch that contains the Challenge Environment and the NeurIPS baseline we will release for it.

The key differences from NetHackChallenge to NetHackScore are:

All menus allowed
Full key board action space allowed (with underlying code for quit, save and options switched off)
Rotating characters (characeter = '@')
PREV_MSG and HISTORY actions are removed (they seemed to be causing crashes)

The model itself we submit as baseline is based heavily on the model in Neurips2020 paper, and posted on the branch here (neurips2020release). Here we have taken the best model and cut out all the other elements of the model, to keep the baseline simple, and to try to keep the model to only one file.

The results of this run (for rotating characters and single characters are visible here ).

My suspicion is the model has gotten a bit slower (6000 sps) so there is probably room for a speed up. We can do this after the competition has started.

nle/agent/config.yaml

nle/agent/polybeast_env.py

tscmoo · 2021-05-25T14:24:37Z

src/cmd.c

+}
+
+int nle_done2() {
+    pline("You can't quit now, you're having so much fun!"); 


I love these messages

tscmoo · 2021-05-25T14:30:07Z

nle/agent/core/vtrace.py

-        vs_t_plus_1 = torch.cat(
-            [vs[1:], broadcasted_bootstrap_values.unsqueeze(0)], dim=0
-        )
+        vs_t_plus_1 = torch.cat([vs[1:], torch.unsqueeze(bootstrap_value, 0)], dim=0)


What is the rationale for changing this file? Are you sure the broadcasting isn't needed?
Just wondering as this file is not original, but copied from elsewhere.

nle/nle/agent/core/vtrace.py

Line 128 in 0424d29

vs_t_plus_1 = torch.cat([vs[1:], torch.unsqueeze(bootstrap_value, 0)], dim=0)

Pulled straight out of our neurips2020release branch

Let's leave this file as-is.

nle/nethack/actions.py

tscmoo · 2021-05-25T14:53:36Z

nle/scripts/play.py

+                # print('--------')
+                # env.render(render_mode)
+                # print('--------')
+                # print('\033[31A') # Go up 31 lines.


Are these lines supposed to be commented out? (also the ones below)

will adjust these

cdmatters · 2021-05-27T22:22:56Z

Although our CI is still down (and should be put up again asap) can confirm the code is all blacked, tests all pass, and the only flake8 errors we have are line too long in baseline model... (aren't these fine?)

cdmatters · 2021-06-03T21:51:23Z

Changes:

Quitting has been reenabled, to allow for safe clean up of NLE
The competition environment now aborts if you go 10000 steps without advancing in game timer
You can load the polyhydra in test mode (it was broken before)

Needless to say I could squash restructure these commits, and maybe should for quick reversion.

heiner · 2021-06-04T11:05:34Z

nle/env/base.py

@@ -149,6 +149,22 @@
    "\x1b[95m",
    "\x1b[96m",
    "\x1b[97m",
+    "\033[30m",


What is this? Why do we have this constant?

heiner · 2021-06-04T11:06:34Z

nle/env/base.py

-            observation, done, exceptions=True
-        )
+        if (
+            not self._allow_all_modes or observation[self._program_state_index][0] == 1


What does this do? Doesn't seem very readable to me.

This checks if death the game is over. will improve clarity

nle/env/base.py

heiner · 2021-06-04T11:10:53Z

nle/scripts/play.py

                env.render(render_mode)
+                print("--------")
+                if not print_all_frames:


That's a bit of a weird name for what it does (going back to override previous frames)?

if you are not printing all frames, then you are going back and overwriting instead. i wanted a store_true parameter that would stop the overwriting behaviour. howabout print_frames_separately?

heiner · 2021-06-04T11:11:54Z

nle/scripts/ttyplay.py

@@ -10,6 +10,7 @@
 import struct
 import termios
 import time
+from nle.nethack.actions import _ACTIONS_DICT


I don't like this as it makes ttyplay.py nethack-only all of a sudden. Better to add another script and import the relevant functions.

do you mean nle only?

do you mean now you cant just take the script and play any old ttyrec, away from nle?

heiner · 2021-06-04T11:13:01Z

nle/tests/test_envs.py

@@ -365,7 +365,7 @@ def test_final_reward(self, env):

        # Hack to quit.
        env.env.step(nethack.M("q"))
-        _, reward, done, _ = env.step(env._actions.index(ord("y")))
+        obs, reward, done, _ = env.step(env._actions.index(ord("y")))


Why this change?

i was undoing a previous change to the tests a few commits back, and slipped.

Suggested change

obs, reward, done, _ = env.step(env._actions.index(ord("y")))

_, reward, done, _ = env.step(env._actions.index(ord("y")))

heiner

This polybeast version here truly is a beast, in the bad sense of the word. I'm not convinced hydra is adding anything here either, but I understand this is just a copy&paste from the agent repo.

I can see how this might not be the easiest setup to start experimenting.

cdmatters · 2021-06-06T15:28:42Z

I have rebased and restructured into 3 commits. I'm increasingly tempted to cut out the agent and move it to a branch - am happy to do this if we think this is sensible.

This environment is derived from NetHackScore with some key differences: * starting character is '@' (random) by default * `perform_known_steps` is only ever called on end, meaning that menus, yes no questions and text input are all open again * the action space is as full as possible Note in this change is included the slight modification of the actions enums to prevent dangeours actions being called, and to expand to allow numbers and +/- signs. Also saving and option changing have been disabled.

actions to new file ttyplay2.py. Also the play file now overwrites frames instead of printing all frames separately.

cdmatters · 2021-06-06T15:37:49Z

All tests passed so rebased to include latest changes.

heiner · 2021-06-06T19:12:22Z

I have rebased and restructured into 3 commits. I'm increasingly tempted to cut out the agent and move it to a branch - am happy to do this if we think this is sensible.

I think moving the agent to a branch is a good idea.

cdmatters · 2021-06-07T08:43:03Z

The agent code now lives at #163. When this is merged, I will rebase that one off merge so the versions all make sense (currently the "challenge" environment doesnt exist in that branch.

* `tty_colours` runs 0-31 (not 0-32) * `inv_strs` runs 0-255 (not 0-127) as you can name your inventory items with meta + keypress.

cdmatters · 2021-06-07T08:47:22Z

(Also a quick drive by fix for the observation space bug spotted on Friday)

heiner

Generally looks good to me.

heiner · 2021-06-07T10:28:25Z

nle/nethack/actions.py

+    HELP = ord("?")  # give a help message
+    PREVMSG = C("p")  # view recent game messages
+
+
 class Command(enum.IntEnum):


Previously, this was "all commands" (or something), and the "RL relevant ones" we gathered in their own enum. What's the philosophy now?

This enum wasn't really all commands because it didnt contain movement etc? I figured if we were splitting out lets at least clearly mark the very dangerous ones.

heiner · 2021-06-07T10:28:58Z

nle/scripts/play.py

+    no_render,
+    render_mode,
+    print_frames_separately,
+    **kwargs,


What do we use the **kwargs for?

the debug flag is created but does nothing in this function because it has been used previously. however this fn is called with play(**vars(flags)), so it needs a home here. the home i thought would be clearer in **kwargs, but i almost put **_

heiner · 2021-06-07T10:30:21Z

nle/scripts/ttyplay.py

@@ -126,6 +135,12 @@ def process(f):
            continue

        if channel == 1:  # Input channel.
+            os.write(
+                1, b"\033[s\033[26;0f\033[37;1mFrame %d:\033[0m " % frame
+            )  # Save Cursor & Jump to L26


could this interfere with other saved cursors from the data stream?

A quick google suggests, yes you can only save one cursor at a time, so yes this would interfere. On test rendering i've not seen any problems though for our ttyrec2s...

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 25, 2021

cdmatters requested a review from tscmoo May 25, 2021 11:11

cdmatters commented May 25, 2021

View reviewed changes

nle/agent/config.yaml Outdated Show resolved Hide resolved

nle/agent/polybeast_env.py Outdated Show resolved Hide resolved

tscmoo reviewed May 25, 2021

View reviewed changes

cdmatters requested a review from heiner June 3, 2021 21:51

heiner reviewed Jun 4, 2021

View reviewed changes

nle/env/base.py Show resolved Hide resolved

heiner reviewed Jun 4, 2021

View reviewed changes

cdmatters force-pushed the eric/competition branch 2 times, most recently from cdaa283 to f8d1ebe Compare June 6, 2021 15:27

cdmatters added 2 commits June 6, 2021 08:29

Fixup scripts by adding keypresses to ttyplay.py, and nle specific

cf9aebf

actions to new file ttyplay2.py. Also the play file now overwrites frames instead of printing all frames separately.

cdmatters force-pushed the eric/competition branch from f8d1ebe to 50ccf04 Compare June 6, 2021 15:37

cdmatters force-pushed the eric/competition branch from 50ccf04 to cf9aebf Compare June 7, 2021 08:41

Fix range of observation spaces.

dbf53f0

* `tty_colours` runs 0-31 (not 0-32) * `inv_strs` runs 0-255 (not 0-127) as you can name your inventory items with meta + keypress.

heiner approved these changes Jun 7, 2021

View reviewed changes

cdmatters merged commit fc2483a into master Jun 7, 2021

heiner deleted the eric/competition branch July 9, 2021 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eric/competition #156

Eric/competition #156

cdmatters commented May 25, 2021

tscmoo May 25, 2021

tscmoo May 25, 2021

cdmatters May 27, 2021

heiner Jun 4, 2021

tscmoo May 25, 2021

cdmatters May 27, 2021

cdmatters May 27, 2021

cdmatters commented May 27, 2021

cdmatters commented Jun 3, 2021

heiner Jun 4, 2021

heiner Jun 4, 2021

cdmatters Jun 4, 2021

heiner Jun 4, 2021

cdmatters Jun 4, 2021

heiner Jun 4, 2021

cdmatters Jun 4, 2021

cdmatters Jun 4, 2021

heiner Jun 4, 2021

cdmatters Jun 4, 2021

cdmatters Jun 4, 2021

heiner left a comment

cdmatters commented Jun 6, 2021

cdmatters commented Jun 6, 2021

heiner commented Jun 6, 2021

cdmatters commented Jun 7, 2021

cdmatters commented Jun 7, 2021

heiner left a comment

heiner Jun 7, 2021

cdmatters Jun 7, 2021

heiner Jun 7, 2021

cdmatters Jun 7, 2021

heiner Jun 7, 2021

cdmatters Jun 7, 2021

	obs, reward, done, _ = env.step(env._actions.index(ord("y")))
	_, reward, done, _ = env.step(env._actions.index(ord("y")))

Eric/competition #156

Eric/competition #156

Conversation

cdmatters commented May 25, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdmatters commented May 27, 2021

cdmatters commented Jun 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heiner left a comment

Choose a reason for hiding this comment

cdmatters commented Jun 6, 2021

cdmatters commented Jun 6, 2021

heiner commented Jun 6, 2021

cdmatters commented Jun 7, 2021

cdmatters commented Jun 7, 2021

heiner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment