Skip to content
This repository has been archived by the owner on May 6, 2024. It is now read-only.

Eric/competition #156

Merged
merged 3 commits into from
Jun 7, 2021
Merged

Eric/competition #156

merged 3 commits into from
Jun 7, 2021

Conversation

cdmatters
Copy link
Contributor

This is the branch that contains the Challenge Environment and the NeurIPS baseline we will release for it.

The key differences from NetHackChallenge to NetHackScore are:

  • All menus allowed
  • Full key board action space allowed (with underlying code for quit, save and options switched off)
  • Rotating characters (characeter = '@')
  • PREV_MSG and HISTORY actions are removed (they seemed to be causing crashes)

The model itself we submit as baseline is based heavily on the model in Neurips2020 paper, and posted on the branch here (neurips2020release). Here we have taken the best model and cut out all the other elements of the model, to keep the baseline simple, and to try to keep the model to only one file.

The results of this run (for rotating characters and single characters are visible here ).

My suspicion is the model has gotten a bit slower (6000 sps) so there is probably room for a speed up. We can do this after the competition has started.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 25, 2021
@cdmatters cdmatters requested a review from tscmoo May 25, 2021 11:11
}

int nle_done2() {
pline("You can't quit now, you're having so much fun!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love these messages

vs_t_plus_1 = torch.cat(
[vs[1:], broadcasted_bootstrap_values.unsqueeze(0)], dim=0
)
vs_t_plus_1 = torch.cat([vs[1:], torch.unsqueeze(bootstrap_value, 0)], dim=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale for changing this file? Are you sure the broadcasting isn't needed?
Just wondering as this file is not original, but copied from elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vs_t_plus_1 = torch.cat([vs[1:], torch.unsqueeze(bootstrap_value, 0)], dim=0)

Pulled straight out of our neurips2020release branch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave this file as-is.

# print('--------')
# env.render(render_mode)
# print('--------')
# print('\033[31A') # Go up 31 lines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these lines supposed to be commented out? (also the ones below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will adjust these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cdmatters
Copy link
Contributor Author

Although our CI is still down (and should be put up again asap) can confirm the code is all blacked, tests all pass, and the only flake8 errors we have are line too long in baseline model... (aren't these fine?)

@cdmatters
Copy link
Contributor Author

Changes:

  • Quitting has been reenabled, to allow for safe clean up of NLE
  • The competition environment now aborts if you go 10000 steps without advancing in game timer
  • You can load the polyhydra in test mode (it was broken before)

Needless to say I could squash restructure these commits, and maybe should for quick reversion.

@cdmatters cdmatters requested a review from heiner June 3, 2021 21:51
nle/env/base.py Outdated
@@ -149,6 +149,22 @@
"\x1b[95m",
"\x1b[96m",
"\x1b[97m",
"\033[30m",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this? Why do we have this constant?

nle/env/base.py Outdated
observation, done, exceptions=True
)
if (
not self._allow_all_modes or observation[self._program_state_index][0] == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? Doesn't seem very readable to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks if death the game is over. will improve clarity

env.render(render_mode)
print("--------")
if not print_all_frames:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit of a weird name for what it does (going back to override previous frames)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are not printing all frames, then you are going back and overwriting instead. i wanted a store_true parameter that would stop the overwriting behaviour. howabout print_frames_separately?

@@ -10,6 +10,7 @@
import struct
import termios
import time
from nle.nethack.actions import _ACTIONS_DICT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this as it makes ttyplay.py nethack-only all of a sudden. Better to add another script and import the relevant functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean nle only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean now you cant just take the script and play any old ttyrec, away from nle?

@@ -365,7 +365,7 @@ def test_final_reward(self, env):

# Hack to quit.
env.env.step(nethack.M("q"))
_, reward, done, _ = env.step(env._actions.index(ord("y")))
obs, reward, done, _ = env.step(env._actions.index(ord("y")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was undoing a previous change to the tests a few commits back, and slipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
obs, reward, done, _ = env.step(env._actions.index(ord("y")))
_, reward, done, _ = env.step(env._actions.index(ord("y")))

Copy link
Contributor

@heiner heiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This polybeast version here truly is a beast, in the bad sense of the word. I'm not convinced hydra is adding anything here either, but I understand this is just a copy&paste from the agent repo.

I can see how this might not be the easiest setup to start experimenting.

@cdmatters cdmatters force-pushed the eric/competition branch 2 times, most recently from cdaa283 to f8d1ebe Compare June 6, 2021 15:27
@cdmatters
Copy link
Contributor Author

I have rebased and restructured into 3 commits. I'm increasingly tempted to cut out the agent and move it to a branch - am happy to do this if we think this is sensible.

cdmatters added 2 commits June 6, 2021 08:29
This environment is derived from NetHackScore with some key differences:
    * starting character is '@' (random) by default
    * `perform_known_steps` is only ever called on end, meaning that
    menus, yes no questions and text input are all open again
    * the action space is as full as possible

Note in this change is included the slight modification of the actions
enums to prevent dangeours actions being called, and to expand to allow
numbers and +/- signs.

Also saving and option changing have been disabled.
actions to new file ttyplay2.py.  Also the play file now overwrites
frames instead of printing all frames separately.
@cdmatters
Copy link
Contributor Author

All tests passed so rebased to include latest changes.

@heiner
Copy link
Contributor

heiner commented Jun 6, 2021

I have rebased and restructured into 3 commits. I'm increasingly tempted to cut out the agent and move it to a branch - am happy to do this if we think this is sensible.

I think moving the agent to a branch is a good idea.

@cdmatters
Copy link
Contributor Author

The agent code now lives at #163. When this is merged, I will rebase that one off merge so the versions all make sense (currently the "challenge" environment doesnt exist in that branch.

* `tty_colours` runs 0-31 (not 0-32)
* `inv_strs` runs 0-255 (not 0-127) as you can name your inventory items
with meta + keypress.
@cdmatters
Copy link
Contributor Author

(Also a quick drive by fix for the observation space bug spotted on Friday)

Copy link
Contributor

@heiner heiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me.

HELP = ord("?") # give a help message
PREVMSG = C("p") # view recent game messages


class Command(enum.IntEnum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this was "all commands" (or something), and the "RL relevant ones" we gathered in their own enum. What's the philosophy now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enum wasn't really all commands because it didnt contain movement etc? I figured if we were splitting out lets at least clearly mark the very dangerous ones.

no_render,
render_mode,
print_frames_separately,
**kwargs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we use the **kwargs for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the debug flag is created but does nothing in this function because it has been used previously. however this fn is called with play(**vars(flags)), so it needs a home here. the home i thought would be clearer in **kwargs, but i almost put **_

@@ -126,6 +135,12 @@ def process(f):
continue

if channel == 1: # Input channel.
os.write(
1, b"\033[s\033[26;0f\033[37;1mFrame %d:\033[0m " % frame
) # Save Cursor & Jump to L26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this interfere with other saved cursors from the data stream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick google suggests, yes you can only save one cursor at a time, so yes this would interfere. On test rendering i've not seen any problems though for our ttyrec2s...

@cdmatters cdmatters merged commit fc2483a into master Jun 7, 2021
@heiner heiner deleted the eric/competition branch July 9, 2021 18:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants