[Regression] Telemetry Usage Stats API no longer honor `timeRange` parameter #109960

ycombinator · 2021-08-24T20:50:15Z

Kibana version:

7.11.0

Elasticsearch version:

7.11.0

Describe the bug:

Up through version 7.10.0, the POST /api/telemetry/v2/clusters/_stats API used to accept a timeRange parameter in the request body like so:

{
  "timeRange": {
    "min": "2021-08-24T10:20:08-07:00",
    "max": "2021-08-24T11:19:31-07:00"
  }
}

Starting from version 7.11.0, however, the same API call returns the following error response:

{"statusCode":400,"error":"Bad Request","message":"[request body.timeRange]: definition for this key is missing"}

Steps to reproduce:

curl -X POST "https://$HOSTPORT/api/telemetry/v2/clusters/_stats" -u elastic:$PASSWORD -H 'kbn-xsrf: true' -H 'Content-Type: application/json' -d '{"unencrypted":true,"timeRange":{"max":"2021-08-24T11:17:30-07:00","min":"2021-08-24T10:20:08-07:00"}}'

Expected behavior:

API honors timeRange parameter as before and returns usage stats.

Context:

The Cloud Billing team calls this API on a periodic (roughly hourly) basis on every Kibana instance running in Cloud. We are not currently using the data returned by this API but it would be good to know whether this is indeed a regression that will be fixed or if this was a deliberate change. If it's the latter, it would also be good to know if there's an alternate way to make the equivalent request starting with Kibana version 7.11.0.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-08-24T20:50:17Z

Pinging @elastic/kibana-telemetry (Team:KibanaTelemetry)

afharo · 2021-08-26T09:24:37Z

@ycombinator sorry about the confusion. It's indeed an intentional change: #81579

We realized that each collector internally needed different ranges (#55171), no matter what the from and to values were, in order to report meaningful data. So we thought that it made sense that only one timestamp was provided.

If it's not causing too much of a problem, I'd close this issue with "works as designed" flag.

ycombinator · 2021-08-26T11:30:51Z

Yeah, it's fine to close this issue with the "works as designed" flag but it would be helpful if you could post here an example or two of what the equivalent calls should be starting Kibana version 7.11.0. AFAICT, this API is not documented (probably since it's an Elastic-internal API) so I can't easily tell what the new contract is. This will help us (Cloud Billing) write code to call the right APIs depending on the Kibana version, since there could be a variety of Kibana instance versions running in Cloud at any time.

afharo · 2021-08-31T09:44:25Z

@ycombinator this is the new contract:

POST /api/telemetry/v2/clusters/_stats
{
  unencrypted: true
}

As you may have noticed, there is no concept of timestamp. This is because we could only provide historical data when the source of the telemetry stats was the monitoring indices. However, for local telemetry (when Kibana actively runs the collectors), it's always now (the code has always dismissed the timeRange for this type of collection).

Sorry! We didn't know you used it for Cloud Billing purposes. We'll keep you in mind for any changes we may need to introduce in the future (i.e.: #96538).

cc @elastic/kibana-core

ycombinator · 2021-08-31T11:54:02Z

Thanks @afharo! I'll look into how this change will impact the Cloud Billing code and file any follow up issues if necessary. But for now, we're good to close this issue here.

[EDIT] And thanks for linking to #96538 as well. I have subscribed to it now.

ycombinator · 2021-08-31T23:12:55Z

As you may have noticed, there is no concept of timestamp. This is because we could only provide historical data when the source of the telemetry stats was the monitoring indices. However, for local telemetry (when Kibana actively runs the collectors), it's always now (the code has always dismissed the timeRange for this type of collection).

@afharo I want to clarify this part a bit.

Cloud Billing's use case for this API is to know which features, e.g. Reporting, were being used between a start and end timestamp. This is because there is a process we run (roughly every hour, but the exact interval could vary a bit) that calls this API and asks the question: give me the telemetry usage stats between the last time I ran (timeRange.min) and now (timeRange.max).

Starting with 7.11, since there is no concept of timeRange for this API any more, what time range will the metrics in the response cover? I know you said, "it's always now" but does that apply to timeRange.max? If so, what's timeRange.min assumed to be?

Also, under what conditions does this API use monitoring indices as the source vs. local collection as the source? Is this something that can vary, depending on when the API is called?

afharo · 2021-09-01T10:47:35Z

Cloud Billing's use case for this API is to know which features, e.g. Reporting, were being used between a start and end timestamp. This is because there is a process we run (roughly every hour, but the exact interval could vary a bit) that calls this API and asks the question: give me the telemetry usage stats between the last time I ran (timeRange.min) and now (timeRange.max).

That's kind of the reason we preferred to remove the concept of time from this API. AFAIK, this API has never returned the usage stats between min and max (as in the delta between min and max). It always returns the full snapshot. In the Reporting use case: the collector reports a combination of usage for all and the last 7 days. Triggering a request periodically will only show always increasing numbers for all (they might decrease for the last 7 days if not used in that period of time).

Also, under what conditions does this API use monitoring indices as the source vs. local collection as the source? Is this something that can vary, depending on when the API is called?
Pre-7.11, if it found any data in the .monitoring-* indices (on Monitoring clusters), it would return telemetry from there. Otherwise, it would fallback to the local collection. The timeRange would only affect the time constraints applied to the query in the .monitoring-* indices. However, as you may already know, usage stats were only retrieved every 24h, so the value timeRange.min was always corrected to ensure a 24h span between min and max.
In addition to that, the data available in the .monitoring-* indices had the same structure as the local collection, so the snapshot/no-delta principles apply here as well. The only difference is that you could get historical snapshot data.

From 7.11, we stopped shipping Kibana's usage to the .monitoring indices (we may still report usage from Logstash & Beats), so all the Kibana usage will likely come from the local collection (first item of the array in the response), and usage from LS and Beats might come up on the following items.

To identify the source of the data, if you can see collection: 'local' in the root of the object, you're looking at a locally sourced collection. And it means that the response is as fresh as it can get.

I've tried to summarize all the above in the table below:

Kibana's version	Type of collection	Time range
Pre-7.11	Monitoring-sourced, falling back to "local" if the cluster does not have `.monitoring` indices (it's not a Monitoring cluster)	Used to retrieve the snapshot reported between `max-24h` and `max` when querying the `.monitoring-*` indices. Always `now` for "local".
Post-7.11	"local" AND monitoring-sourced (if available). As in response = `[{...localUsage}, {...monitoredClusterOne}, {...monitoredClusterTwo}, ...]`	`now` for "local" and `now-20min` to `now` when querying `.monitoring-*` indices

I'll cc @Bamieh just in case he wants to add anything.

ycombinator · 2021-09-01T15:52:12Z

Thanks for the detailed explanation and the summary table at the end, @afharo!

That's kind of the reason we preferred to remove the concept of time from this API. AFAIK, this API has never returned the usage stats between min and max (as in the delta between min and max). It always returns the full snapshot. In the Reporting use case: the collector reports a combination of usage for all and the last 7 days. Triggering a request periodically will only show always increasing numbers for all (they might decrease for the last 7 days if not used in that period of time).

When you say "full snapshot", you mean the duration between when the Kibana server started up and the time of the API request (now), right?

afharo · 2021-09-02T09:22:52Z

When you say "full snapshot", you mean the duration between when the Kibana server started up and the time of the API request (now), right?

As always, IT depends 🙃
In the snapshot, some metrics are kind of size_of_the_index/saved objects count (usually always growing ever since the cluster was created) and "last day/7/30/90 days". However, there are some others are only kept in memory (like the ops metrics' request statuses), so restarts may affect them.

ycombinator · 2021-09-02T09:42:39Z

That makes sense, thanks @afharo.

At the moment the only feature we're looking at from this API's response is Reporting. That will definitely change in the future. For Reporting (the "all" key), are the metrics from the time the cluster was created or from the time of the latest restart?

afharo · 2021-09-02T09:46:15Z

I'll defer on @elastic/kibana-reporting-services to fully confirm. But, looking at the implementation, I'd say it's for the entire life of the cluster.

ycombinator added bug Fixes for quality problems that affect the customer experience Team:KibanaTelemetry labels Aug 24, 2021

ycombinator closed this as completed Aug 31, 2021

lukeelmers added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Telemetry and removed Team:KibanaTelemetry labels Oct 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Regression] Telemetry Usage Stats API no longer honor `timeRange` parameter #109960

[Regression] Telemetry Usage Stats API no longer honor `timeRange` parameter #109960

ycombinator commented Aug 24, 2021 •

edited

Loading

elasticmachine commented Aug 24, 2021

afharo commented Aug 26, 2021

ycombinator commented Aug 26, 2021 •

edited

Loading

afharo commented Aug 31, 2021

ycombinator commented Aug 31, 2021 •

edited

Loading

ycombinator commented Aug 31, 2021 •

edited

Loading

afharo commented Sep 1, 2021

ycombinator commented Sep 1, 2021

afharo commented Sep 2, 2021

ycombinator commented Sep 2, 2021 •

edited

Loading

afharo commented Sep 2, 2021

[Regression] Telemetry Usage Stats API no longer honor timeRange parameter #109960

[Regression] Telemetry Usage Stats API no longer honor timeRange parameter #109960

Comments

ycombinator commented Aug 24, 2021 • edited Loading

elasticmachine commented Aug 24, 2021

afharo commented Aug 26, 2021

ycombinator commented Aug 26, 2021 • edited Loading

afharo commented Aug 31, 2021

ycombinator commented Aug 31, 2021 • edited Loading

ycombinator commented Aug 31, 2021 • edited Loading

afharo commented Sep 1, 2021

ycombinator commented Sep 1, 2021

afharo commented Sep 2, 2021

ycombinator commented Sep 2, 2021 • edited Loading

afharo commented Sep 2, 2021

[Regression] Telemetry Usage Stats API no longer honor `timeRange` parameter #109960

[Regression] Telemetry Usage Stats API no longer honor `timeRange` parameter #109960

ycombinator commented Aug 24, 2021 •

edited

Loading

ycombinator commented Aug 26, 2021 •

edited

Loading

ycombinator commented Aug 31, 2021 •

edited

Loading

ycombinator commented Aug 31, 2021 •

edited

Loading

ycombinator commented Sep 2, 2021 •

edited

Loading