Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust catch scoring to match stable score V2 #26405

Merged
merged 4 commits into from
Jan 9, 2024
Merged

Conversation

bdach
Copy link
Collaborator

@bdach bdach commented Jan 5, 2024

This pull request changes the catch scoring algorithm to - as far as I can tell - match stable score V2 exactly.

Key changes to the scoring algorithm

All non-bonus objects now are equally weighted for accuracy

Compare stable.

This is achieved by assigning 300 base value to Great, LargeTickHit, and SmallTickHit in GetBaseScoreForResult().

Notably, there are a couple of usages that sort by GetBaseScoreForResult(). The first I honestly can't really bring myself to care about too much (responsible for converting ancient lazer scores to new algorithm, which I find disposable and irrelevant at this time, but should be trivial to fix if we do care), and the second filters by IsBasic(), so is not broken by this in any capacity.

Combo / "accuracy" apportioning is now beatmap-dependent

Compare stable.

The "accuracy" proportion of score is given by

$$ 400,000 \cdot \frac{\texttt{numSmallDroplets}}{\texttt{numSmallDroplets} + \texttt{numFruits}} $$

And yes, it is intentional that the denominator does not include large droplets. Compare stable.

If the denominator of the expression above is 0, the accuracy portion is considered non-existent, which also matches stable.

The combo portion is 1 million minus the above.

Only fruits and large droplets contribute to combo portion

Compare stable.

Notably, for combo portion, normal fruits are weighted 300 and large droplets are weighted 100. Compare stable.

Only small droplets contribute to "accuracy" portion

Compare stable.

Bananas now give 200 bonus

Compare stable.

A notable edge case difference here is that on beatmaps that do not have any fruits, stable score V2 would just refuse to give any points whatsoever, even if bananas were caught. This matters for beatmaps that only consist of banana showers, such as this one. lazer will instead grant 1M score for free and then tack on points from bananas caught onto that.


All of the above was experimentally and organoleptically examined, cross-checked, and compared against stable. Comparisons were generally not easy, because replay playback consistency is not guaranteed, but I believe the implementation to now be matching 1:1.

Score conversion

To match the modified scoring algorithm, the conversion code from score V1 was also adjusted to match.

I started off from a very simple version of it, but on the resulting spreadsheet I found it to have very wrong results for some beatmaps. Further investigation showed that due to differences in rate of growth between score V1 and score V2, unique to catch, the previous method of estimation of combo portion was not suitable in specific circumstances.

To spare the reader's eyes, the derivation of the algorithm implemented in this PR is explained in this gist. Enter at your own risk.

The resulting spreadsheet showing the impact of the modified score conversion can be found here. The analysis of selected outliers on that spreadsheet can be found in this gist.

All replays used in testing above are here: replays.zip

Notably, some score conversions were and continue to be visibly broken, and are not fixed due to overarching concerns that affect all rulesets, not only catch. I may attempt to fix them separately, but that might require more time.

Implications on infrastructure

Legacy scoring attributes need to be adjusted for catch ruleset

The specific change that forces this is the change of bonus given by bananas. Because BonusScoreRatio is a ratio of bonus given by standardised scoring to ratio of bonus given by stable score V1 bonus, changing the number of points given by each banana obviously changes the ratio itself.

The simplest and safest way to resolve this would be to do a full run osu-difficulty-calculator over catch beatmaps and converts. However, a riskier but quicker option would be to do this manually, leveraging the fact that the change to BonusScoreRatio is simple - because bonus score per banana rose (200 / 50) = fourfold, it may be enough to just execute something along the lines of (WARNING, UNTESTED, CHECK FIVE TIMES BEFORE EXECUTING):

UPDATE `osu_beatmap_scoring_attribs`
SET `legacy_bonus_score_ratio` = 4 * `legacy_bonus_score_ratio`
WHERE `mode` = 2;

Stable scores need to be reimported

Because this pull changes how both accuracy and total score are computed when importing stable scores, a full reimport of stable scores for catch is required.


cc @Firuqwq @bastoo0 @SecreOsu @Kimitakari @Nirfu as potentially interested in cross-checking / testing this

@bdach bdach requested a review from a team January 5, 2024 21:24
@bdach bdach self-assigned this Jan 5, 2024
@bdach
Copy link
Collaborator Author

bdach commented Jan 8, 2024

!diffcalc
GENERATORS=score
RULESET=catch

Copy link

github-actions bot commented Jan 8, 2024

@peppy peppy self-requested a review January 9, 2024 15:50
@peppy peppy merged commit a4c9e9f into ppy:master Jan 9, 2024
17 checks passed
@bdach bdach deleted the catch-scoring branch January 9, 2024 16:30
bdach added a commit to bdach/osu that referenced this pull request Jan 10, 2024
…`difficultyPeppyStars` rounding

Fixes issue that occurs on *about* 245 beatmaps and was first described
by me on discord:

    https://discord.com/channels/188630481301012481/188630652340404224/1154367700378865715

and then rediscovered again during work on
ppy#26405:

    https://gist.github.com/bdach/414d5289f65b0399fa8f9732245a4f7c#venenog-on-ultmate-end-by-blacky-overdose-631

It so happens that in stable, due to .NET Framework internals, float
math would be performed using x87 registers and opcodes.
.NET (Core) however uses SSE instructions on 32- and 64-bit words.
x87 registers are _80 bits_ wide. Which is notably wider than _both_
float and double. Therefore, on a significant number of beatmaps,
the rounding would not produce correct values.

See following gist for corroboration of the above:

    https://gist.github.com/bdach/dcde58d5a3607b0408faa3aa2b67bf10

Thus, to crudely - but, seemingly accurately, after checking across
all ranked maps - emulate this, use `decimal`, which is slow, but has
bigger precision than `double`.

Doing this requires some fooling of the compiler / runtime (see second
inline comment in new method). To corroborate that this is required,
you can try the following code snippet:

    Console.WriteLine(string.Join(' ', BitConverter.GetBytes(1.3f).Select(x => x.ToString("X2"))));
    Console.WriteLine(string.Join(' ', BitConverter.GetBytes(1.3).Select(x => x.ToString("X2"))));
    Console.WriteLine();

    decimal d1 = (decimal)1.3f;
    decimal d2 = (decimal)1.3;
    decimal d3 = (decimal)(double)1.3f;

    Console.WriteLine(string.Join(' ', decimal.GetBits(d1).SelectMany(BitConverter.GetBytes).Select(x => x.ToString("X2"))));
    Console.WriteLine(string.Join(' ', decimal.GetBits(d2).SelectMany(BitConverter.GetBytes).Select(x => x.ToString("X2"))));
    Console.WriteLine(string.Join(' ', decimal.GetBits(d3).SelectMany(BitConverter.GetBytes).Select(x => x.ToString("X2"))));

which will print

    66 66 A6 3F
    CD CC CC CC CC CC F4 3F

    0D 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00
    0D 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00
    8C 5D 89 FB 3B 76 00 00 00 00 00 00 00 00 0E 00

Note that despite `d1` being converted from a less-precise floating-
-point value than `d2`, it still is represented 100% accurately as
a decimal number.

After applying this change, recomputation of legacy scoring attributes
for *all* rulesets will be required.
bdach added a commit to bdach/osu that referenced this pull request Jan 10, 2024
…`difficultyPeppyStars` rounding

Fixes issue that occurs on *about* 246 beatmaps and was first described
by me on discord:

    https://discord.com/channels/188630481301012481/188630652340404224/1154367700378865715

and then rediscovered again during work on
ppy#26405:

    https://gist.github.com/bdach/414d5289f65b0399fa8f9732245a4f7c#venenog-on-ultmate-end-by-blacky-overdose-631

It so happens that in stable, due to .NET Framework internals, float
math would be performed using x87 registers and opcodes.
.NET (Core) however uses SSE instructions on 32- and 64-bit words.
x87 registers are _80 bits_ wide. Which is notably wider than _both_
float and double. Therefore, on a significant number of beatmaps,
the rounding would not produce correct values due to insufficient
precision.

See following gist for corroboration of the above:

    https://gist.github.com/bdach/dcde58d5a3607b0408faa3aa2b67bf10

Thus, to crudely - but, seemingly accurately, after checking across
all ranked maps - emulate this, use `decimal`, which is slow, but has
bigger precision than `double`. The single known exception beatmap
in whose case this results in an incorrect result is

    https://osu.ppy.sh/beatmapsets/1156087#osu/2625853

which is considered an "acceptable casualty" of sorts.

Doing this requires some fooling of the compiler / runtime (see second
inline comment in new method). To corroborate that this is required,
you can try the following code snippet:

    Console.WriteLine(string.Join(' ', BitConverter.GetBytes(1.3f).Select(x => x.ToString("X2"))));
    Console.WriteLine(string.Join(' ', BitConverter.GetBytes(1.3).Select(x => x.ToString("X2"))));
    Console.WriteLine();

    decimal d1 = (decimal)1.3f;
    decimal d2 = (decimal)1.3;
    decimal d3 = (decimal)(double)1.3f;

    Console.WriteLine(string.Join(' ', decimal.GetBits(d1).SelectMany(BitConverter.GetBytes).Select(x => x.ToString("X2"))));
    Console.WriteLine(string.Join(' ', decimal.GetBits(d2).SelectMany(BitConverter.GetBytes).Select(x => x.ToString("X2"))));
    Console.WriteLine(string.Join(' ', decimal.GetBits(d3).SelectMany(BitConverter.GetBytes).Select(x => x.ToString("X2"))));

which will print

    66 66 A6 3F
    CD CC CC CC CC CC F4 3F

    0D 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00
    0D 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00
    8C 5D 89 FB 3B 76 00 00 00 00 00 00 00 00 0E 00

Note that despite `d1` being converted from a less-precise floating-
-point value than `d2`, it still is represented 100% accurately as
a decimal number.

After applying this change, recomputation of legacy scoring attributes
for *all* rulesets will be required.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

3 participants