Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle date histogram scaling for table vis and avg_buckets metric #11929

Merged
merged 3 commits into from
Jul 19, 2017
Merged

Handle date histogram scaling for table vis and avg_buckets metric #11929

merged 3 commits into from
Jul 19, 2017

Conversation

trevan
Copy link
Contributor

@trevan trevan commented May 19, 2017

The line/area/bar vis handles time scaling when you select an interval that creates too many buckets. But table vis and the new avg_metric with a date histogram doesn't.

I moved the scaling code to the tabify code to handle both table vis and line vis. I'm not sure how to handle two date histograms in a table that both cause scaling. In that case, the scaling multiplier would be wrong as they are based on the main timespan. I'm not sure if it is worth handling, though.

For the avg_metric, I overrode the getValue to check if scaling is required by the customBucket. I couldn't use the "metricScale" value on the customBucket like the tabify code does because that checks that all metrics are count and sum. Since the avg_metric isn't count or sum, it will always cause it to be false. Instead, I check that customMetric is count or sum and then add the scaling. This won't work for nested sibling metrics where the avg_metric is nested inside of the max or min bucket but I think that might require a bit more work to get the scaling to take place since you'll have to ignore the result from elasticsearch and almost recalculate the pipeline.

@elasticmachine
Copy link
Contributor

Can one of the admins verify this patch?

@trevan
Copy link
Contributor Author

trevan commented May 19, 2017

@nreese, this should fix the issue that I mentioned in #4646

@trevan
Copy link
Contributor Author

trevan commented May 19, 2017

Another possibility is to get rid of the scaling if using the table or avg_bucket metric. I don't know if the scaling exists for ES issues or just because of browser rendering issues.

@trevan trevan changed the title WIP: Handle date histogram scaling for table vis and avg_buckets metric Handle date histogram scaling for table vis and avg_buckets metric May 19, 2017
@thomasneirynck thomasneirynck added the Feature:Visualizations Generic visualization features (in case no more specific feature label is available) label May 19, 2017
@thomasneirynck
Copy link
Contributor

jenkins, test this

@trevan
Copy link
Contributor Author

trevan commented May 20, 2017

There is a test that is failing in master as well. It is on confirm_modal test. I've fixed the other ones.

@thomasneirynck
Copy link
Contributor

jenkins, test this

@thomasneirynck thomasneirynck requested a review from ppisljar May 31, 2017 14:51
@ppisljar
Copy link
Member

ppisljar commented Jun 6, 2017

Another possibility is to get rid of the scaling if using the table or avg_bucket metric. I don't know if the scaling exists for ES issues or just because of browser rendering issues.

@trevan i taught that is the current scenario ? (that we don't do scaling for these ?)

@ppisljar
Copy link
Member

ppisljar commented Jun 6, 2017

also, if there is an issue with avg_bucket i bet the same applies to other bucket aggs (min, max, sum)

@ppisljar
Copy link
Member

ppisljar commented Jun 6, 2017

i just checked the bucket aggs ... i set aggregation to date histogram, interval to millisecond, click play then in the spy panel i check the request and i can see :

  "aggs": {
    "1": {
      "avg_bucket": {
        "buckets_path": "1-bucket>_count"
      }
    },
    "1-bucket": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "30m",
        "time_zone": "Europe/Berlin",
        "min_doc_count": 1
      }
    }
  },

so the scaling does apply even for bucket aggs (without this PR) .... ?

@ppisljar
Copy link
Member

ppisljar commented Jun 6, 2017

also in the data table scaling seems to be applied correctly (before this PR) ?

@trevan
Copy link
Contributor Author

trevan commented Jun 6, 2017

@ppisljar, I'm not talking about modifying the query to send a different bucket interval. I'm talking about scaling the results so that if I request for bytes/second, I get bytes/second even if the bucket interval has changed to 30m.

If you do a line chart where the interval has to be scaled, the resulting line will be the original requested value (bytes/second instead of bytes/fixed_time_interval). But table chart doesn't do that, nor does the pipeline aggregations. In #4646, someone requested to see the average bytes/second in a large timespan that causes it to change the bucket size, then instead of getting average bytes/second, you'll get average bytes/fixed_time_interval.

@ppisljar
Copy link
Member

ppisljar commented Jun 7, 2017

thanks for clarification @trevan, will try to play with this today

@@ -61,14 +63,17 @@ export function AggResponseTabifyProvider(Private, Notifier) {
}
break;
case 'metrics':
const value = agg.getValue(bucket);
let value = agg.getValue(bucket);
if (aggScale !== 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove if-statement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably put a comment here but the if statement is there to prevent "scaling" non numbers. Since only "count" and "sum" are allowed to scale, if you pick a different metric (such as top_hits) and don't have this if statement, then it could multiple a string with 1 and get a NaN. By only scaling if the scale has changed, it is guaranteed that the value to be scaled is a number.

Hope that makes sense :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it does :)

Copy link
Contributor

@thomasneirynck thomasneirynck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @trevan, this is addressing a gnarly issue

Overall, I'm in favor of this change. It works around a limitation we introduced because we don't want to overload ES with too large of a query or Kibana with too large of a result. This is a technical limitation that isn't all that relevant to the end-user.

But it is somewhat of an obscure feature. And already Kibana can surprise users with all the normalization that is going on in the background. For example, the auto-correct of the interval is one of them. With this PR, Kibana actually tries to "correct" this, after Kibana already "corrected" it, but only for aggregation-types that we can reasonably do this (e.g. taking an average of counts is something we can do, but not an average of averages). Just typing this makes my head spin ;)

So when we go down this road, we need to communicate this clearly.

I think the easiest way to do this would be to make the label more explicit and add have the denominator in the default title.

So Kibana table's column title or y-axis label would be something like:

Overall Average of count per second, or whatever the metric it is that Kibana has normalized.

I am not quite sure what you are referring to when talking about this happening in line charts/bar charts. Could you expand on this?

e.g. below is a before and after of a line chart doing an average count per second.

before:
image

after:
image

Given the changes in this PR, it'd also needs some tests.

],
getValue: function (agg, bucket) {
const customMetric = agg.params.customMetric;
const scalingMetric = customMetric.type && (customMetric.type.name === 'count' || customMetric.type.name === 'sum');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dense. This needs some inline comment to explain why we will scale only for counts and sums.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied that line from https://github.com/elastic/kibana/blob/master/src/ui/public/agg_types/buckets/date_histogram.js#L116. I'm not sure why Kibana only scales for counts and sums so I just left it as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you extract to function? it will show the relationship

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an "isScalable" function on the metric type and count and sum return true.

const scalingMetric = customMetric.type && (customMetric.type.name === 'count' || customMetric.type.name === 'sum');

let value = bucket[agg.id] && bucket[agg.id].value;
if (value && scalingMetric) {
Copy link
Contributor

@thomasneirynck thomasneirynck Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably can just remove the value from the check (?). this sort of works due to the falsy nature of 0, and it doesn't need scaling.

@trevan
Copy link
Contributor Author

trevan commented Jun 16, 2017

@thomasneirynck, here's a line chart showing how it works on a non modified Kibana.

screen shot 2017-06-16 at 3 45 34 pm

If you look at the chart, the maximum it shows is 10 but if you look at the spy table at the bottom, it shows it goes over 10. That is because I asked for 1 minute buckets but Kibana requested 30 minute buckets.

If I do this same visualization as a bar or area, it would also do the conversion.

But if I do a table visualization, it should show the value for the 30 minute buckets but say that it is showing per minute (compare the column header to the actual data):

screen shot 2017-06-16 at 3 49 32 pm

I totally agree that the metric name for sibling aggregations that are using a date histogram bucket should show what timespan is being used. I think that is a separate issue from this but I could try and fix it.

Copy link
Contributor

@thomasneirynck thomasneirynck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @trevan, apologies for the delay on this one. I think this is good addition to table. @ppisljar, do you want to take a second look?

As for improving the labeling, OK, we can wait on that. It's a similar problem as here #12816, so let's go through that one first before making any more changes to that.

],
getValue: function (agg, bucket) {
const customMetric = agg.params.customMetric;
const scalingMetric = customMetric.type && (customMetric.type.name === 'count' || customMetric.type.name === 'sum');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you extract to function? it will show the relationship

@trevan
Copy link
Contributor Author

trevan commented Jul 18, 2017

I rebased against master

@ppisljar
Copy link
Member

seems to work well, i really like that the spy panel values match the chart values when scaling happens.
there is an issue with label, for example when column header is saying timespan per minute (but its actually per 30 minutes or sth) ... but this is something we are addressing in another PR #12816

Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ppisljar
Copy link
Member

@thomasneirynck @trevan should we backport ?

@ppisljar ppisljar merged commit 396b073 into elastic:master Jul 19, 2017
@trevan
Copy link
Contributor Author

trevan commented Jul 19, 2017

@ppisljar, I don't need it backported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Visualizations Generic visualization features (in case no more specific feature label is available) release_note:fix review v6.0.0-rc1 v6.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants