-
Notifications
You must be signed in to change notification settings - Fork 113
Conversation
There are two performance improvements. * The first is only storing screen_descriptions if the observation key is present. This increases performance by about ~2-3k sps on average, for those environments that don't use the observation key * The second is changing from a manual copy to strncpy over c_strings. This seems to add another ~3k to the performance of the screen_descriptions observation.
This improves performance by another or 2k sps, as we can now use the very fast memcpy and strncpy for our copy operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not in a position to assess the intricacies of this, but from my view this looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like both changes. Thanks for checking this, this is worse than I assumed!
Does this PR as-is make NLE as fast when not using screen_descriptions (which I feel may not be super useful to most users) as it was before we added that feature? If not, we could consider adding an additional code path to get that information.
There is one final thing we can do to improve performance (at the cost of the sensible engineering practice) - we could write the string descriptions straight into the observation.
I wouldn't necessarily be against that. It would need to be combined with a clearer contract on what set_buffers does when called between resets / after the first reset. I'm happy to not even allow it being called then tbh.
@@ -885,7 +876,9 @@ NetHackRL::rl_print_glyph(winid wid, XCHAR_P x, XCHAR_P y, int glyph, | |||
if (wid == WIN_MAP) { | |||
instance->store_glyph(x, y, glyph); | |||
instance->store_mapped_glyph(ch, color, special, x, y); | |||
instance->store_screen_description(x, y, glyph); | |||
if (nle_get_obs()->screen_descriptions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This somewhat violates the contract -- in theory, a user could request the screen_descriptions observation only after the current step has happened. (In practise, we may just not want to support that and demand users reset after changes to the observation arrays, or indeed decide at the very beginning.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we you should ever be able to change the observation arrays. I cant think of a good use case, where you would have to do that and could not just create a new environment.
Also, it might be sensible to program a bit defensively and wrap this into a function returning boolean to indicate we aren't changing pr using the observation.
So I tested this against v0.5.2 and somewhat dubiously got segfaults! Given our troubles with issue #120 , another reason to be happy to see the back of ncurses. Running for only 10 rounds gives numbers (20,19,18 ksps) which implies the speed hit rectified by this PR was offset by removing ncurses. In general yes, with this PR turning off screen_descriptions restores the performance to what it (probably) should be. |
I dont think users should be calling |
TL:DR
Having profiled the code using the tests presented in PR #130 , it became clear the performance of the screen descriptions was adding some quite considerable hits, not just to its own observation keys but everyone. Below are two commits, offering two solutions - I prefer the last which is the fastest.
Master
Commit 252bwd19:
This commit does two things: First it only gets screen_desriptions from nethack if they are asked for, and secondly, it uses
strncpy
instead of a for loop. Each of these changes is pretty significant, the former adding 2-3ksps to everyone's performance and the latter adding the same just to screen_descriptions obskey environments. The price paid is a rather ugly access of the global observation object in the function, but the performance improvement is significant.Commit a1793a6
This commit replaces the array of std::strings with one big char array that we strncpy to and then memcopy out. The use of these built ins adds a lot of speed, but we also need to only fill the array with 0's when necessary, so there is another global observation object check.
There is one final thing we can do to improve performance (at the cost of the sensible engineering practice) - we could write the string descriptions straight into the observation. It feels kind of gross, but its worth remembering that we more or less will be doing the same stuff with the tty observations.