Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the roundoff error in the Avg of N others calculation. #1248

Merged
merged 1 commit into from
Mar 12, 2020

Conversation

jpwhite4
Copy link
Member

@jpwhite4 jpwhite4 commented Mar 6, 2020

The "Average of N others" calculation was performed using the
AVG() function in SQL then multiplying by the number of rows and
dividing by the total number of data series. Since AVG() is
SUM() / COUNT() the calculation was:

   SUM()         COUNT()
 ---------   *  ---------
  COUNT()         total

Performing the calculation in this way increases the roundoff error compared to simply computing SUM() / total. This pull request changes the code to instead calculate the average as

   SUM()
 ---------
   total

which has lower roundoff error due to fewer operations.

The keen eyed reader might ask why not do the calculation in SQL rather
than the SUM() in SQL and division in PHP. This is to maintain backwards
compatibility with the existing behaviour where average values of
zero are plotted on the chart as a zero. This differs from the
rest of XDMoD where zero values are converted to null and are not plotted. If the calculation were
preformed in SQL then a zero would be converted to null in the
convertSQLtoPHP() function. This, of course, could be changed too, but
this would be a much more intrusive change.

The regression tests that exercise this function are in the UsageChart image export and the MetricChart regression. For the data in the docker image, the expected values for the person group-by change by the most amount.

The "Average of N others" calculation was performed using the
AVG() function in SQL then multiplying by the number of rows and
dividing by the total number of data series. Since AVG() is
SUM()/COUNT() the calculation was:
```
   SUM()         COUNT()
 ---------   *  ---------
  COUNT()         total
```

Performing the calculation in this way increases the roundoff error.
This pull request changes the code to instead calculate the average as
```
   SUM()
 ---------
   total
```
which has lower roundoff error due to fewer operations.

The keen eyed reader might ask why not do the calculation in SQL rather
than the SUM() in SQL and division in PHP. This is to maintain backwards
compatibility with the existing behaviour where if the average value is
zero then it is plotted on the chart as a zero. This differs from the
rest of XDMoD where zero values are not plotted. If the calculation were
preformed in SQL then a zero would be converted to null in the
`convertSQLtoPHP()` function. This, of course, could be changed too, but
this would be a much more intrusive change.
Copy link
Contributor

@jsperhac jsperhac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on your appetite for eliding commented cruft, you could also dispense with non-executed code at lines 268, 554, 723, 793.

Copy link
Contributor

@jsperhac jsperhac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on your appetite for eliding commented cruft, you could also dispense with non-executed code at lines 268, 554, 723, 793.

Comment on lines +776 to +780
$useMean = $operation === 'SUM' && (strpos($statAlias, 'avg_') !== false
|| strpos($statAlias, 'count') !== false
|| strpos($statAlias, 'utilization') !== false
|| strpos($statAlias, 'rate') !== false
|| strpos($statAlias, 'expansion_factor') !== false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting off the second part of the expression might clarify it, i.e.

$useMean = $operation === 'SUM' &&
(
rest-of-long-expression-line 1
rest-of-long-expression-line 2
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood that these are all temporary improvements, so I'm marking this PR accepted as-is. Thanks much!

@plessbd
Copy link
Contributor

plessbd commented Mar 6, 2020

I think @jpwhite4 forgot to mention that this will all be going away with his refactored code to make all of this faster. The point of this pull request is to make the updated code changes now with tests so that when he updated the code fully it wont be changing how the values are calculated.

@jtpalmer jtpalmer added this to the 9.0.0 milestone Mar 9, 2020
@jtpalmer jtpalmer added enhancement Enhancement of the functionality of an existing feature metric Updates or additions of metrics in a realm Category:Infrastructure Internal infrastructure updates/changes labels Mar 9, 2020
|| strpos($stat, 'rate') !== false
|| strpos($stat, 'expansion_factor') !== false;

$operation = $useMean ? "AVG" : "SUM";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do decide to update the comments then this function's should probably be updated to indicate that AVG is not performed in the database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category:Infrastructure Internal infrastructure updates/changes enhancement Enhancement of the functionality of an existing feature metric Updates or additions of metrics in a realm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants