-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
receive: Create proposal for backfiling on remote write #2599
Comments
As the discussion prometheus/prometheus#535 evolved to something quite complex (but I see it isn't so easy, initially I though that this was a Prometheus feature that I just didn't knew about 😄 ), would this enhancement be a Thanos ad hoc solution to backfill historical data into Prometheus? I'm building an analytical application that reads historical usage data from customers to do some calculations and identify potential optimizations based on this historical data set of metrics. We're writing some exporters, and we wouldn't mind to write directly any Prometheus file format, or push into a Thanos endpoint that can backfill data, but I see no docs on how to do this "manually". Any references here that can help me to understand how to do this? Something using gRPC client streaming would be very nice ❤️! But this is an implementation detail that we can do by ourselves. Anyway, just to be in sync with the community priorities, we can live without backfiling for the next months as we're currently developing features and researching solutions, but can I expect the ETA to be at most at the end of this year? If you guys need, I can help to implement once you've decided how the architecture should be! And thanks for proactively open this issue, I think this is a really good way to start solving the backfill issue that the community want so much judging by the number of reacts and discussions in related threads to this subject. |
Nice! Welcome to the community 👋 So actually you might be interested in those discussions:
We can't promise anything, but it looks like some backfilling options are coming pretty soon ! (: Currently it's possible, but it requires some magic in Go (: |
Hello 👋 Looks like there was no activity on this issue for last 30 days. |
Closing for now as promised, let us know if you need this to be reopened! 🤗 |
This use case is very important for us since even some normal amount of upstream metrics batch ingest will easily result in metrics with over an hour but within 2 hours of lag. The linked discussion in Cortex is very helpful, but ultimately doesn't help until it is implemented here in Thanos :) Can we have this issue re-opened so it doesn't get forgotten? |
This has also been discussed at the Prometheus dev summit. Once Prometheus implements it, we should probably follow the same strategy in Thanos. I believe it's unlikely that that will happen in the receive component, but that's a detail I believe. |
Hello. Not quite sure if this issue is exactly the right one, but coming from #2490 I have an use case. Please tell me, if there is an ticket that fits better. We are currently implementing an anomaly detection with Thanos. It works well so far, but it is a quite complex query which needs data from up to 4 weeks ago. Due to the complexity it is more readable and has a better performance when reusing calculations by intermediate recording rules. Of course we do not have data from 4 weeks ago yet, because we just started to write the recording rules. Therefore we would need to wait full 4 weeks to be sure that the rules are working properly. With backfilling we might be able to retrospectively calculate the recording rules and see the result immediately. Of course we can inline most of the queries and skip the immediate recording rules, but this requires a lot of resources. Also it is impossible, when using a counter which does not have a Here is an example rule file:The version without intermediate recording rulesgroups:
- name: anomaly-detection-1m
interval: 30s
rules:
- record: rule_action:wafsc_evaluations:rate1m:seasonal_prediction
expr: >
quantile(0.5,
label_replace(
avg_over_time(rule_action:wafsc_evaluations:rate1m[4h] offset 166h)
+ avg_over_time(rule_action:wafsc_evaluations:rate1m[1w])
- avg_over_time(rule_action:wafsc_evaluations:rate1m[1w] offset 1w)
, "offset", "1w", "", "")
or
label_replace(
avg_over_time(rule_action:wafsc_evaluations:rate1m[4h] offset 334h)
+ avg_over_time(rule_action:wafsc_evaluations:rate1m[1w])
- avg_over_time(rule_action:wafsc_evaluations:rate1m[1w] offset 2w)
, "offset", "2w", "", "")
or
label_replace(
avg_over_time(rule_action:wafsc_evaluations:rate1m[4h] offset 502h)
+ avg_over_time(rule_action:wafsc_evaluations:rate1m[1w])
- avg_over_time(rule_action:wafsc_evaluations:rate1m[1w] offset 3w)
, "offset", "3w", "", "")
) without (offset)
- record: rule_action:wafsc_evaluations:rate1m:seasonal_prediction:z_score
expr: >
(
rule_action:wafsc_evaluations:rate1m
- rule_action:wafsc_evaluations:rate1m:seasonal_prediction
) / stddev_over_time(rule_action:wafsc_evaluations:rate1m[1w]) The optimized versiongroups:
- name: anomaly-detection-1m
interval: 30s
rules:
- record: rule_action:wafsc_evaluations:rate1m:avg_over_time_1w
expr: avg_over_time(rule_action:wafsc_evaluations:rate1m[1w])
- record: rule_action:wafsc_evaluations:rate1m:stddev_over_time_1w
expr: stddev_over_time(rule_action:wafsc_evaluations:rate1m[1w])
- record: rule_action:wafsc_evaluations:rate1m:z_score
expr: >
(
rule_action:wafsc_evaluations:rate1m -
rule_action:wafsc_evaluations:rate1m:avg_over_time_1w
) / rule_action:wafsc_evaluations:rate1m:stddev_over_time_1w
- record: rule_action:wafsc_evaluations:rate1m:seasonal_prediction
expr: >
quantile(0.5,
label_replace(
avg_over_time(rule_action:wafsc_evaluations:rate1m[4h] offset 166h)
+ rule_action:wafsc_evaluations:rate1m:avg_over_time_1w
- rule_action:wafsc_evaluations:rate1m:avg_over_time_1w offset 1w
, "offset", "1w", "", "")
or
label_replace(
avg_over_time(rule_action:wafsc_evaluations:rate1m[4h] offset 334h)
+ rule_action:wafsc_evaluations:rate1m:avg_over_time_1w
- rule_action:wafsc_evaluations:rate1m:avg_over_time_1w offset 2w
, "offset", "2w", "", "")
or
label_replace(
avg_over_time(rule_action:wafsc_evaluations:rate1m[4h] offset 502h)
+ rule_action:wafsc_evaluations:rate1m:avg_over_time_1w
- rule_action:wafsc_evaluations:rate1m:avg_over_time_1w offset 3w
, "offset", "3w", "", "")
) without (offset)
- record: rule_action:wafsc_evaluations:rate1m:seasonal_prediction:z_score
expr: >
(
rule_action:wafsc_evaluations:rate1m
- rule_action:wafsc_evaluations:rate1m:seasonal_prediction
) / rule_action:wafsc_evaluations:rate1m:stddev_over_time_1w
|
There is work on retroactive rule evaluations happening on Prometheus already. Once that's figured out there we'll probably implement the same mechanism in Thanos. We look at backfilling data more as a thing for retrofitting non Prometheus data into the system, retroactive rule evaluations may make use of the same infrastructure but should be a first class feature, at least eventually. |
Hello 👋 Looks like there was no activity on this issue for last 30 days. |
Closing for now as promised, let us know if you need this to be reopened! 🤗 |
For those who found this issue via search: thanks to @bwplotka and @dipack95 work on prometheus side it is now possible to import custom data in prometheus text format into Thanos via https://github.com/sepich/thanos-kit/ |
Currently, we have lots of backfill solutions that rely on block upload. We should invest in them and make them better. BUT, this issue is for remote write backfill, which is in dev and design by the amazing Grafana team!. PTAL there. |
Since the OOO feature has been merged into main, I will close this issue. Feel free to reopen if you think it is not addressed. |
As per our discussion here we decided to enable Remote Write backfilling.
Thoughts so far:
Help wanted, but the topic is extremely difficult. The design must-have apriori.
cc @gouthamve @pracucci @brancz @RichiH, @tomwilkie @squat
The text was updated successfully, but these errors were encountered: