Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expectation of working when Target Allocator is down #2159

Closed
sfc-gh-akrishnan opened this issue Sep 25, 2023 · 3 comments
Closed

Expectation of working when Target Allocator is down #2159

sfc-gh-akrishnan opened this issue Sep 25, 2023 · 3 comments
Labels
area:target-allocator Issues for target-allocator question Further information is requested

Comments

@sfc-gh-akrishnan
Copy link

When Target Allocator is unreachable or down for some duration, what is the expectation of working from the Otel-Collector's perspective.

Assumption:

  • All the otel-collectors pods have gotten the configuration of targets to scrape from at least ONCE

Possible work-patterns:

  • Continue to scrape endpoints with the previously gotten config
  • Stop scraping till target allocator again becomes available for scrapping

TIA : )

@jaronoff97 jaronoff97 added question Further information is requested area:target-allocator Issues for target-allocator labels Sep 26, 2023
@sfc-gh-akrishnan
Copy link
Author

Hi Folks,
Requesting some guidance on the item. Thank you :)

@sfc-gh-akrishnan
Copy link
Author

@jaronoff97, bringing this to your attention if you could help me on this regard or tag the right people who can answer the question?

TIA

@jaronoff97
Copy link
Contributor

to answer the immediate question: we recommend running the target allocator with the consistent hashing filter strategy with at least 2 replicas to enable a high availability mode in the scenario where the target allocator is down. For now there is no fallback in the collector as that would potentially cause dual scrapes. This is a design choice that was made with the CAP theorem in mind. i.e. imagine there's a network partition between the collectors and target allocators, we opt for the collectors to remain available if they are handling other workloads (tracing and logs) and correct (theyre not sending incorrect metric data). Basically, we are doing the second approach you mentioned where we just stop scraping because we would fail those scrapes.

If you have more questions by the way, our group is a bit more responsive on slack if there's more urgency to your questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:target-allocator Issues for target-allocator question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants