Skip to content
This repository has been archived by the owner on May 6, 2024. It is now read-only.

Improve performance #131

Merged
merged 2 commits into from
Nov 16, 2020
Merged

Improve performance #131

merged 2 commits into from
Nov 16, 2020

Conversation

cdmatters
Copy link
Contributor

@cdmatters cdmatters commented Nov 15, 2020

TL:DR

Having profiled the code using the tests presented in PR #130 , it became clear the performance of the screen descriptions was adding some quite considerable hits, not just to its own observation keys but everyone. Below are two commits, offering two solutions - I prefer the last which is the fastest.

Master

master

Commit 252bwd19:

This commit does two things: First it only gets screen_desriptions from nethack if they are asked for, and secondly, it uses strncpy instead of a for loop. Each of these changes is pretty significant, the former adding 2-3ksps to everyone's performance and the latter adding the same just to screen_descriptions obskey environments. The price paid is a rather ugly access of the global observation object in the function, but the performance improvement is significant.

strncpy

Commit a1793a6

This commit replaces the array of std::strings with one big char array that we strncpy to and then memcopy out. The use of these built ins adds a lot of speed, but we also need to only fill the array with 0's when necessary, so there is another global observation object check.

memcopy

There is one final thing we can do to improve performance (at the cost of the sensible engineering practice) - we could write the string descriptions straight into the observation. It feels kind of gross, but its worth remembering that we more or less will be doing the same stuff with the tty observations.

There are two performance improvements.
* The first is only storing screen_descriptions if the observation key
is present. This increases performance by about  ~2-3k sps on average,
for those environments that don't use the observation key
* The second is changing from a manual copy to strncpy over c_strings.
This seems to add another ~3k to the performance of the screen_descriptions
observation.
This improves performance by another or 2k sps, as we can now use the very
fast memcpy and strncpy for our copy operations.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 15, 2020
@cdmatters cdmatters requested review from heiner and rockt November 16, 2020 09:54
Copy link
Contributor

@rockt rockt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not in a position to assess the intricacies of this, but from my view this looks good.

Copy link
Contributor

@heiner heiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like both changes. Thanks for checking this, this is worse than I assumed!

Does this PR as-is make NLE as fast when not using screen_descriptions (which I feel may not be super useful to most users) as it was before we added that feature? If not, we could consider adding an additional code path to get that information.

There is one final thing we can do to improve performance (at the cost of the sensible engineering practice) - we could write the string descriptions straight into the observation.

I wouldn't necessarily be against that. It would need to be combined with a clearer contract on what set_buffers does when called between resets / after the first reset. I'm happy to not even allow it being called then tbh.

@@ -885,7 +876,9 @@ NetHackRL::rl_print_glyph(winid wid, XCHAR_P x, XCHAR_P y, int glyph,
if (wid == WIN_MAP) {
instance->store_glyph(x, y, glyph);
instance->store_mapped_glyph(ch, color, special, x, y);
instance->store_screen_description(x, y, glyph);
if (nle_get_obs()->screen_descriptions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This somewhat violates the contract -- in theory, a user could request the screen_descriptions observation only after the current step has happened. (In practise, we may just not want to support that and demand users reset after changes to the observation arrays, or indeed decide at the very beginning.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think we you should ever be able to change the observation arrays. I cant think of a good use case, where you would have to do that and could not just create a new environment.

Also, it might be sensible to program a bit defensively and wrap this into a function returning boolean to indicate we aren't changing pr using the observation.

@cdmatters
Copy link
Contributor Author

Does this PR as-is make NLE as fast when not using screen_descriptions (which I feel may not be super useful to most users) as it was before we added that feature? If not, we could consider adding an additional code path to get that information.

So I tested this against v0.5.2 and somewhat dubiously got segfaults! Given our troubles with issue #120 , another reason to be happy to see the back of ncurses. Running for only 10 rounds gives numbers (20,19,18 ksps) which implies the speed hit rectified by this PR was offset by removing ncurses. In general yes, with this PR turning off screen_descriptions restores the performance to what it (probably) should be.

@cdmatters
Copy link
Contributor Author

I wouldn't necessarily be against that. It would need to be combined with a clearer contract on what set_buffers does when called between resets / after the first reset. I'm happy to not even allow it being called then tbh.

I dont think users should be calling set_buffers at all, or really change observation keys after the gym environment is created.

@cdmatters cdmatters merged commit 437c9ee into master Nov 16, 2020
@cdmatters cdmatters deleted the eric/lang-perf branch November 18, 2020 17:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants