Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce weariness instability (includes tests); clarified #47653

Merged
merged 22 commits into from
Apr 7, 2021

Conversation

actual-nh
Copy link
Contributor

@actual-nh actual-nh commented Feb 21, 2021

Summary

Bugfixes "Alter weary.tracker, weary.intake changes to be smoother; add tests for illogical fluctuations in weary level"

Purpose of change

This PR is to help avoid situations with unexpected fluctuations of weary levels, such as - during the same activity - the weary level reversing direction. (For instance, going from 2 to 3 during rest.)

These changes were mostly worked out in #47273. They reduce the current jumps in weary.tracker (and partially in weary.intake); these jumps, as demonstrated in test results (see #47273 and #47590), are partially responsible for weary level fluctuations. (Note that the test results for #47590 were without any modifications to try_reduce_weariness().)

It also adds tests for these fluctuations, although two are commented out since this PR does not completely fix the problem. (That will require more significant alterations in weary.intake - probably going to an exponential moving average (see #46906 for a bit on this) - which will take only a bit more work but require significantly more testing.) (See also #46384 and #48009; this PR is not a full fix for these, but should reduce the frequency of problems.)

Describe the solution

Decrement weary.tracker on a more frequent basis but - to compensate - reduce it by a smaller amount. (This will be every 5 minutes, which is how often try_reduce_weariness() is called.)

Also do this - decrementing more frequently but by a smaller amount - to weary.intake. Unlike with weary.tracker, this has not been done as much as it theoretically could be. This process involves a multiplication that is likely to result in more significant roundoff errors if done in much smaller amounts than the current PR. (By "more significant" is meant frequently having no reduction in weary.intake. The purpose behind the exact multiplier in use is also rather unclear.)There is some possibility of such roundoff errors happening with this PR's adjustments, but rather less so than if one tries multiplying by the 1/12th power of something at or above 0.95. (Note in this regard #47791's most recent results - specifically, the consistency across architectures and compilers, provided the testing seed is the same. Given that #47791 incorporates this PR's changes, this suggests that roundoff is not a significant problem with this PR.)

Describe alternatives you've considered

As noted above, this PR could go all the way to changing weary.intake completely. It could also try reducing weary.intake jumps further, at the risk of significant roundoff error.

The have_weary_increase() test function is not currently used; it can be deleted if desired. Done.

One could also stabilize the weary level by implementing hysteresis - as in, increase weary level at, say, 125% of the current threshold, but only decrease it at, say, 75%. This would be rather complicated due to the frequent threshold changes.

Testing

Test functions have been added to activity_scheduling_helper.cpp/h, and used in weary_test.cpp, to check for illogical fluctuations. (As noted above, these have been commented out in two instances since this PR is not a complete fix.) For other testing, see #47273.

Testing just the weary tests in a different order (with the 24-hour tests first, for instance) results in errors from both the new tests and existing tests. However, the same thing - with just the existing tests, of course - happens with the current master.

Additional context

Ping: @anothersimulacrum, @I-am-Erk?

Start on adding tests for unrealistic fluctuations in weary level;
see CleverRaven#46384 (and some cases in CleverRaven#46941) for example problems. The initial
tests look for problems with the weary_recovery task of digging for 8
hours then waiting for 8 hours; weary level should not go down in the
first 8 hours, and should not go up in the second 8 hours.

(cherry picked from commit bdd942b)
In some conditions, namely continuous exercise at the same level, a
decrease in weariness level is unrealistic. Check for this.

(cherry picked from commit 49fc687)
Weary levels keep fluctuating unrealistically, probably because
weary.intake (and weary.tracker?) are changing in large jumps at times.

(cherry picked from commit db03ee5)
This adjusts expected test values to the (quite consistent) ones for
the altered weary intake/tracker. It also does the weary.tracker
adjustment in a less-perfectionistic (but more-functional) way.

(cherry picked from commit 028b0f0)
This is an extension of a previous modification to try to smooth out
the weary.intake reduction (smaller changes but more frequent), and
both increases that and does similar for weary.tracker. The
weary.intake changes are leading up to - probably after 0.F - an
exponential moving average being used instead, so that characters
are not, essentially hypoglycemic.

(cherry picked from commit ab4ba67)
As far as I can tell, the cause of the weary.tracker going up during
resting periods is that rest does still expend calories (bmr),
and it happens every 5 minutes - while weary.tracker was only reduced
every 30 (current) or 15 (this branch) minutes. This commit makes
weary.tracker reduction occur every 5 minutes - every time
try_reduce_weariness() is called.

(cherry picked from commit 5d9dcb2)
Some of the test times were being altered by the every-30-minute
(awake) weary.tracker reductions. Alter to match new ones, also taking
into account local testing (including in scrambled order). While with
some scrambled tests am seeing inconsistencies between 8-hour and
12-hour digging, 8-hour without fluctuations indicated 3->4 should not
be 470 minutes, but no more than 465 - which it already was for
12-hour, weirdly enough (oops by me earlier?).

(cherry picked from commit c68507b)
It is going to require more work - most likely adjusting weary.intake
to an exponential moving average - to get rid of these two failed
tests. (Note that these were added tests in the first place; the
failure - weariness fluctuation - in question was happening before
but simply wasn't detected.)
This clears the avatar prior to the (potential) debug output so what
is actually the case inside do_activity can be seen.
@anothersimulacrum anothersimulacrum self-requested a review February 21, 2021 20:45
@BrettDong BrettDong added Code: Tests Measurement, self-control, statistics, balancing. [C++] Changes (can be) made in C++. Previously named `Code` labels Feb 22, 2021
@actual-nh
Copy link
Contributor Author

@BrettDong: Thanks! Mechanics: Character/Player is another appropriate label.

@BrettDong BrettDong added the Mechanics: Character / Player Character / Player mechanics label Feb 22, 2021
@actual-nh
Copy link
Contributor Author

The Travis error was due to a fields problem (see #47218 for one way this might be decreased).

Merge branch 'master' of https://github.com/CleverRaven/Cataclysm-DDA into weariness_fluctuations_1
@actual-nh
Copy link
Contributor Author

actual-nh commented Feb 23, 2021

@anothersimulacrum: Is it a feature, or a bugfix? I can admittedly see arguments for it being both...

@anothersimulacrum
Copy link
Member

You're making mechanical changes, so that was my reasoning.
We have tests making assertions about how this works though, so I suppose with the size of the changes and that, it's probably pretty safe.

@actual-nh
Copy link
Contributor Author

actual-nh commented Feb 23, 2021

If you'd like to have more time to review over it first, feel free to do the Feature Freeze; as I said, I can see arguments for both - and I'd feel horrible if something somehow did go wrong with it for a stable release (OTOH, players are unlikely to be terribly happy with their weariness levels jumping around...).

This (near-duplicate of have_weary_decrease()) is not in use, so
removing to be in accord with CDDA guidelines.
@actual-nh
Copy link
Contributor Author

Anticipating a request to remove unused code.

@actual-nh
Copy link
Contributor Author

actual-nh commented Feb 25, 2021

The LGTM failure was due to "There is not enough space on the disk"... sigh. (See #44598 for an earlier episode of this.)

The expected values for the 8-hour and 12-hour digging tasks, for the
portion before the 8-hour mark, were different. (Not sure who did
that - quite possibly me!) Fixing to approximate more-common values
from these two plus the 8-hour digging/8-hour rest tests, looking at
ones with only the weary tests run.
@actual-nh
Copy link
Contributor Author

actual-nh commented Mar 1, 2021

@anothersimulacrum: I have just noticed that the expected timing numbers differ for the first 8 hours of the 8-hour and 12-hour digging tasks. I regret not having spotted this at the time of the metabolism de-randomization PR. I've just corrected the numbers for this PR's alterations (those I have lots of data for, thanks to #47791 - in particular, data for the weary tests done without any others).

EDIT: Now doing for 12 and 24-hour, and double-checking some of the 24-hour using new data. (The failure for the last one was with a sunburn test, which does not appear to be related.)

I realized that I should also align the 12 and 24-hour digging expected
values.

(cherry picked from commit e445990)
@actual-nh
Copy link
Contributor Author

actual-nh commented Mar 2, 2021

The Travis failure was the grenade lethality test... 🙄

@actual-nh actual-nh changed the title Reduce weariness instability; test for illogical fluctuations in weary level Reduce weariness instability (includes tests) Mar 3, 2021
@actual-nh
Copy link
Contributor Author

actual-nh commented Mar 5, 2021

GCC 7 General Build Matrix failure - fields test... sigh 🙄 .

@actual-nh
Copy link
Contributor Author

@I-am-Erk: Could you take a look at this one? Thanks!

@actual-nh actual-nh changed the title Reduce weariness instability (includes tests) Reduce weariness instability (includes tests); edited Mar 13, 2021
@actual-nh actual-nh changed the title Reduce weariness instability (includes tests); edited Reduce weariness instability (includes tests); clarified Mar 13, 2021
Copy link
Member

@anothersimulacrum anothersimulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, especially that it's decreasing the lag between starting to rest and actually recovering.

Once my reviews are resolved, I think this is good.

src/character.cpp Outdated Show resolved Hide resolved
src/character.cpp Outdated Show resolved Hide resolved
src/character.cpp Outdated Show resolved Hide resolved
actual-nh and others added 2 commits March 16, 2021 15:54
Co-authored-by: anothersimulacrum <anothersimulacrum@gmail.com>
Co-authored-by: anothersimulacrum <anothersimulacrum@gmail.com>
@actual-nh
Copy link
Contributor Author

actual-nh commented Mar 17, 2021

Travis failure is due to a fields test:

 ../tests/field_test.cpp:359: FAILED:
(~[slow] ~[.])=>   CHECK( fields_test_turns() < time_limit_turns )
(~[slow] ~[.])=> with expansion:
(~[slow] ~[.])=>   300 (0x12c) < 300 (0x12c)
(~[slow] ~[.])=> with message:
(~[slow] ~[.])=>   Terrain should be irradiated under 300 turns

I've seen this one previously (enabled by wapcaplet's very helpful PR) - <= looks like it would be a good idea here... may put in a PR for this. EDIT: Done; see #48121.

@actual-nh
Copy link
Contributor Author

actual-nh commented Mar 31, 2021

Since #45316 has been committed, I'll need to do some conflict-fixing - probably tomorrow. EDIT: Done.

I forgot to include <cmath> for std::ceil, std::floor, and std::pow
when I moved things over to activity_tracker.cpp (character.cpp had
it already... as well as lots of others, of course!).
@actual-nh
Copy link
Contributor Author

actual-nh commented Apr 1, 2021

Checking on optimal predicted values to use in weary_test.cpp (changes as a result of #45316) - see #47791.

Unlike some of the other tests, for which one can figure out things
like consistency from test to test (8-hour vs 12-hour vs 24-hour) or
simply the weariness level not going down when it should be going up,
this one is currently close to impossible to figure out what SHOULD
result from it. It is more a check on whether the numbers are a lot
different than they should be. (There would be some justification in
changing the margins for this one to 10.) Thus, small adjustments in
it are sometimes needed; this is an example, with the value used
being from tests of multiple examples of it (see CleverRaven#47653).
@actual-nh
Copy link
Contributor Author

actual-nh commented Apr 2, 2021

A rewritten version of the above commit description:

Unlike some of the other tests, for which one can figure out things
like consistency from test to test (8-hour vs 12-hour vs 24-hour) or
simply the weariness level not going down when it should be going up,
this one is currently close to impossible to figure out what SHOULD
result from it. It is more a check on whether the numbers are a lot
different than they have been. (There would be some justification in
changing the margins for this one to 10.) Thus, small adjustments in
it are sometimes needed; this is an example, with the value used
being from tests of multiple examples of it (see #47791).

@ZhilkinSerg ZhilkinSerg merged commit a728b7c into CleverRaven:master Apr 7, 2021
@actual-nh actual-nh deleted the weariness_fluctuations_1 branch April 7, 2021 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[C++] Changes (can be) made in C++. Previously named `Code` Code: Tests Measurement, self-control, statistics, balancing. Mechanics: Character / Player Character / Player mechanics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants