Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No metrics displayed with Thanos engine ranged query (go_routines[1d]) in query distributes installation (Prometheus engine returns correct output) #8078

Open
s0rl0v opened this issue Jan 29, 2025 · 6 comments

Comments

@s0rl0v
Copy link

s0rl0v commented Jan 29, 2025

Thanos, Prometheus and Golang version used:
Thanos - v0.37.2
Prometheus - v2.55.1
Golang - v1.23.4

Object Storage Provider:
Azure
Huawei OBS

What happened:
I've set up Thanos Query in distributed mode with the following stores (aka local queries) in config:

  • query-az.int.zone
  • query-hw.int.zone

extraArgs:

  • --query.timeout=5m
  • --query.mode=distributed

The local queries have Thanos Gateways hostnames in their corresponding configurations (Prometheus installations with Thanos sidecars).

When I query distributed query with range query (e.g. go_threads[1d]) via Thanos engine it returns nothing
Image

, Prometheus engine - returns correct output.
Image

Querying local query directly also yields correct results with either Thanos and Promethus engines.
Image

What you expected to happen:
Distributed query returns the same output with Promethues and Thanos engines.

How to reproduce it (as minimally and precisely as possible):

  1. Have Thanos installation with Thanos Gateways.
  2. Set up distributed Query with two local queries in --store extra args.
  3. Search any ranged metric (e.g. go_threads[1d]) with Thanos Enginer.
  4. See nothing

Full logs to relevant components:
no errors whatsoever

Anything else we need to know:

Copy link

dosubot bot commented Jan 29, 2025

Hey there, @s0rl0v! I'm here to assist you with any bugs, questions, or contributions you might have. Let's tackle this together!

The issue you're experiencing with Thanos not displaying metrics for range queries in distributed mode could be related to several factors specific to the Thanos engine's handling of distributed queries. Here are some potential areas to investigate:

  1. Query Mode Configuration: Ensure that the --query.mode=distributed flag is correctly set for all relevant components. This mode is essential for enabling distributed query execution across multiple stores [1].

  2. Thanos Gateway Configuration: Verify that the Thanos Gateways are correctly configured and accessible from the Thanos Query component. Any misconfiguration here could lead to issues in fetching data from the stores.

  3. Replica Labels and Deduplication: Check if the --query.replica-label is set correctly to handle deduplication across your stores. Incorrect deduplication settings might result in missing data in the query results [2].

  4. Partial Response Handling: Consider enabling partial responses using the --query.partial-response flag. This can help in scenarios where some stores might not be responding, allowing the query to return partial results instead of nothing [2].

  5. Logs and Debugging: Although you mentioned there are no errors, enabling more verbose logging might provide additional insights into what might be going wrong during the query execution.

  6. Version-Specific Issues: While there are no known issues specifically documented for Thanos v0.37.2 related to distributed queries, it's worth checking the release notes or any recent issues in the Thanos repository for any updates or patches that might address this behavior.

By reviewing these configurations and settings, you might be able to identify the root cause of the issue with the Thanos engine in distributed mode.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@s0rl0v
Copy link
Author

s0rl0v commented Jan 29, 2025

  1. Query Mode Configuration: Ensure that the --query.mode=distributed flag is correctly set for all relevant components. This mode is essential for enabling distributed query execution across multiple stores [1].

The flag set only for global query, local ones are without it.

  1. Thanos Gateway Configuration: Verify that the Thanos Gateways are correctly configured and accessible from the Thanos Query component. Any misconfiguration here could lead to issues in fetching data from the stores.

The gateways are present in Stores menu, there are no errors in query logs assiciated with store unavailability.

  1. Replica Labels and Deduplication: Check if the --query.replica-label is set correctly to handle deduplication across your stores. Incorrect deduplication settings might result in missing data in the query results [2].

replica label is set to "site", Azure metrics have value set to "azure", huawei ones are set to "huawei"

  1. Partial Response Handling: Consider enabling partial responses using the --query.partial-response flag. This can help in scenarios where some stores might not be responding, allowing the query to return partial results instead of nothing [2].

enabled

  1. Logs and Debugging: Although you mentioned there are no errors, enabling more verbose logging might provide additional insights into what might be going wrong during the query execution.

I'll check this out.

  1. Version-Specific Issues: While there are no known issues specifically documented for Thanos v0.37.2 related to distributed queries, it's worth checking the release notes or any recent issues in the Thanos repository for any updates or patches that might address this behavior.

There are no known issues, I've used search before creating this issue.

@s0rl0v s0rl0v changed the title No metrics displayed via Thanos engine ranged query (go_routines[1d]) in query distributes installation No metrics displayed with Thanos engine ranged query (go_routines[1d]) in query distributes installation (Prometheus engine returns correct output) Jan 29, 2025
@fpetkovski
Copy link
Contributor

Could you click on the Endpoints tab and share a screenshot?

@s0rl0v
Copy link
Author

s0rl0v commented Jan 29, 2025

@fpetkovski Sure!
This is what distributed endpoints look like:
Image

These are endpoints from local query (non-distributed)
Image

@Lavaerius
Copy link

Hi there. Is there a chance this is related to this issue?

#7757

@ibrahimasow1
Copy link

ibrahimasow1 commented Mar 28, 2025

I have this same exact issue.
I defined --query.replica-label replica --query.replica-label host in the central querier.
Here are the debug logs from the querier when we are using the prometheus query engine vs the thanos query engine.

# PROMETHEUS QUERY ENGINE
Mar 28 13:31:57 server thanos[269408]: ts=2025-03-28T13:31:57.600201832Z caller=proxy.go:320 level=debug component=proxy request="min_time:1743082317501 max_time:1743168717501 matchers:<name:\"__name__\" value:\"go_threads\" > max_resolution_window:3600000 aggregates:COUNT aggregates:SUM partial_response_strategy:ABORT without_replica_labels:\"host\" without_replica_labels:\"replica\" " msg="Series: started fanout streams" status="Store Addr: 10.0.1.88:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.88\", replica=\"replica-3\"} MinTime: 1742812380035 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.10:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.10\", replica=\"replica-2\"} MinTime: 1742817892408 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.175:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.175\", replica=\"replica-1\"} MinTime: 1742817889672 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.204:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.204\", replica=\"replica-1\"} MinTime: 1742817904708 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.96:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.96\", replica=\"replica-2\"} MinTime: 1742817892386 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.148:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.148\", replica=\"replica-3\"} MinTime: 1742817887992 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.197:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.197\", replica=\"replica-1\"} MinTime: 1742817882624 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.217:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.217\", replica=\"replica-2\"} MinTime: 1742817880379 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.70:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.70\", replica=\"replica-2\"} MinTime: 1742812380035 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.243:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.243\", replica=\"replica-1\"} MinTime: 1742812380035 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.101:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.101\", replica=\"replica-1\"} MinTime: 1742817882624 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.171:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.171\", replica=\"replica-3\"} MinTime: 1742817909187 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.211:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.211\", replica=\"replica-3\"} MinTime: 1742812380035 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.231:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.231\", replica=\"replica-1\"} MinTime: 1742817885892 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.64:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.64\", replica=\"replica-2\"} MinTime: 1742812380035 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.112:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.112\", replica=\"replica-2\"} MinTime: 1742812380035 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.21:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.21\", replica=\"replica-3\"} MinTime: 1742812380035 MaxTime: 9223372036854775807 queried;Store Addr: 10.0.1.250:11901 LabelSets: {@thanos_compatibility_store_type=\"store\"},{host=\"10.0.1.250\", replica=\"replica-3\"} MinTime: 1742817888076 MaxTime: 9223372036854775807 queried"
# THANOS QUERY ENGINE
Mar 28 13:32:51 server thanos[269408]: ts=2025-03-28T13:32:51.019767929Z caller=remote_engine.go:250 level=debug msg="Executed remote query" query=go_threads[1d] time=46.534866ms remote_peak_samples=0 remote_total_samples=0

I was using thanos-0.34.1, then upgraded to thanos-0.37.2 but still facing the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants