-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the roundoff error in the Avg of N others calculation. #1248
Conversation
The "Average of N others" calculation was performed using the AVG() function in SQL then multiplying by the number of rows and dividing by the total number of data series. Since AVG() is SUM()/COUNT() the calculation was: ``` SUM() COUNT() --------- * --------- COUNT() total ``` Performing the calculation in this way increases the roundoff error. This pull request changes the code to instead calculate the average as ``` SUM() --------- total ``` which has lower roundoff error due to fewer operations. The keen eyed reader might ask why not do the calculation in SQL rather than the SUM() in SQL and division in PHP. This is to maintain backwards compatibility with the existing behaviour where if the average value is zero then it is plotted on the chart as a zero. This differs from the rest of XDMoD where zero values are not plotted. If the calculation were preformed in SQL then a zero would be converted to null in the `convertSQLtoPHP()` function. This, of course, could be changed too, but this would be a much more intrusive change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on your appetite for eliding commented cruft, you could also dispense with non-executed code at lines 268, 554, 723, 793.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on your appetite for eliding commented cruft, you could also dispense with non-executed code at lines 268, 554, 723, 793.
$useMean = $operation === 'SUM' && (strpos($statAlias, 'avg_') !== false | ||
|| strpos($statAlias, 'count') !== false | ||
|| strpos($statAlias, 'utilization') !== false | ||
|| strpos($statAlias, 'rate') !== false | ||
|| strpos($statAlias, 'expansion_factor') !== false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting off the second part of the expression might clarify it, i.e.
$useMean = $operation === 'SUM' &&
(
rest-of-long-expression-line 1
rest-of-long-expression-line 2
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood that these are all temporary improvements, so I'm marking this PR accepted as-is. Thanks much!
I think @jpwhite4 forgot to mention that this will all be going away with his refactored code to make all of this faster. The point of this pull request is to make the updated code changes now with tests so that when he updated the code fully it wont be changing how the values are calculated. |
|| strpos($stat, 'rate') !== false | ||
|| strpos($stat, 'expansion_factor') !== false; | ||
|
||
$operation = $useMean ? "AVG" : "SUM"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do decide to update the comments then this function's should probably be updated to indicate that AVG
is not performed in the database.
The "Average of N others" calculation was performed using the
AVG()
function in SQL then multiplying by the number of rows anddividing by the total number of data series. Since
AVG()
isSUM() / COUNT()
the calculation was:Performing the calculation in this way increases the roundoff error compared to simply computing
SUM() / total
. This pull request changes the code to instead calculate the average aswhich has lower roundoff error due to fewer operations.
The keen eyed reader might ask why not do the calculation in SQL rather
than the
SUM()
in SQL and division in PHP. This is to maintain backwardscompatibility with the existing behaviour where average values of
zero are plotted on the chart as a zero. This differs from the
rest of XDMoD where zero values are converted to
null
and are not plotted. If the calculation werepreformed in SQL then a zero would be converted to null in the
convertSQLtoPHP()
function. This, of course, could be changed too, butthis would be a much more intrusive change.
The regression tests that exercise this function are in the UsageChart image export and the MetricChart regression. For the data in the docker image, the expected values for the person group-by change by the most amount.