Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a single index with wildcard in Elasticsearch reader #1969

Closed

Conversation

pavolloffay
Copy link
Member

@pavolloffay pavolloffay commented Dec 9, 2019

Resolves #1361

Use a wildcard (jaeger-span-*) in queries instead of the concrete list of indices (jaeger-span-2018-12-24, jaeger-span-2018-12-25...).

The motivation for this change:

Possible negative impacts:

  • Performance - although kibana also uses wildcard pattern therefore time range queries (on span timestamp) should be well optimized.

Docker images to test:

pavolloffay/jaeger-query:es-wildcard
pavolloffay/jaeger-collector:es-wildcard
  • GetTrace, GetServices and GetOperatios used --es.max-span-age to compose a list of historical indices for the query - now query all indices via the wildcard
  • FidTraceIDs and FindTraces use times from query parameters to compose the list of historical indices - now query all indices but they use timestamp range query.

Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
@codecov
Copy link

codecov bot commented Dec 9, 2019

Codecov Report

Merging #1969 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1969      +/-   ##
==========================================
- Coverage   96.99%   96.99%   -0.01%     
==========================================
  Files         203      203              
  Lines       10061    10034      -27     
==========================================
- Hits         9759     9732      -27     
  Misses        264      264              
  Partials       38       38
Impacted Files Coverage Δ
plugin/storage/es/factory.go 100% <ø> (ø) ⬆️
plugin/storage/es/spanstore/reader.go 100% <100%> (ø) ⬆️
plugin/storage/es/dependencystore/storage.go 82.6% <100%> (-3.11%) ⬇️
plugin/storage/es/options.go 100% <100%> (ø) ⬆️
plugin/storage/es/spanstore/service_operation.go 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1bc28d...8797957. Read the comment docs.

Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
@pavolloffay
Copy link
Member Author

@jaegertracing/elasticsearch we are looking for volunteers to test this functionality especially if the query performance changed. Any feedback is appreciated. The docker images are listed in the PR description.

@pavolloffay pavolloffay changed the title Use wildcard in Elasticsearch indices in reader Use a single index with wildcard in Elasticsearch reader Dec 9, 2019
Copy link
Member

@annanay25 annanay25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also change the esCleaner.py script as part of this PR and allow users to specify a max TTL for span data?

nsConfig.MaxSpanAge,
"The maximum lookback for spans in Elasticsearch")
time.Hour*72,
"(deprecated) The maximum lookback for spans in Elasticsearch. Now all indices are searched.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems confusing. Can we add the message Now all indices are searched in the parenthesis?

@pavolloffay
Copy link
Member Author

Should we also change the esCleaner.py script as part of this PR and allow users to specify a max TTL for span data?

I am not sure if I follow you. The purpose of esCleaner.py script is to clean old data. Elasticsearch does not allow to specify TTL in any form.

@annanay25
Copy link
Member

@pavolloffay Sorry I didn't frame that properly.

Elasticsearch does not allow to specify TTL in any form.

Yes. What I meant was that should we modify the esCleaner.py script as part of this PR as well, to include the change made in index naming format, so that users can continue to specify how long they want to store data.

@pavolloffay
Copy link
Member Author

There is no change in index name format in this PR. The writer still writes to daily indices.

@pavolloffay
Copy link
Member Author

@jkandasa run performance tests to measure query time with this patch. The results prove increased query time by on average 300-400%. Therefore I am closing this PR.

https://docs.google.com/spreadsheets/d/1rwTzDvnHHhssNdZibhFuHBZDf6njt6b1DrvRiJrkt14/edit?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use NumericRangeQuery in ES queries when rollover is used
2 participants