Support regex tags search for Elasticseach backend #2049

annanay25 · 2020-01-30T06:41:40Z

Signed-off-by: Annanay annanayagarwal@gmail.com

Which problem is this PR solving?

When using the ES backend, adds the ability to

use regex search for the values field. So a query like foo=bar.* should match all tags like {foo:bar}, {foo:barbaz}, {foo:bar1}

Related to #684

Short description of the changes

Swapped the NewMatchQuery method in ES reader for NewRegexpQuery

/cc @pavolloffay

Signed-off-by: Annanay <annanayagarwal@gmail.com>

codecov · 2020-01-30T06:57:05Z

Codecov Report

Merging #2049 into master will decrease coverage by 1.09%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2049      +/-   ##
==========================================
- Coverage   97.39%   96.29%   -1.10%     
==========================================
  Files         207      214       +7     
  Lines       10286    10535     +249     
==========================================
+ Hits        10018    10145     +127     
- Misses        223      332     +109     
- Partials       45       58      +13

Impacted Files	Coverage Δ
cmd/agent/app/agent.go	`87.50% <0.00%> (-12.50%)`	⬇️
cmd/query/app/static_handler.go	`86.84% <0.00%> (-1.76%)`	⬇️
...lugin/sampling/strategystore/adaptive/processor.go	`99.20% <0.00%> (-0.80%)`	⬇️
cmd/env/command.go	`100.00% <0.00%> (ø)`
cmd/query/app/flags.go	`100.00% <0.00%> (ø)`
cmd/agent/app/builder.go	`100.00% <0.00%> (ø)`
model/adjuster/clockskew.go	`100.00% <0.00%> (ø)`
cmd/collector/app/metrics.go	`100.00% <0.00%> (ø)`
cmd/collector/app/options.go	`100.00% <0.00%> (ø)`
pkg/config/tlscfg/options.go	`100.00% <0.00%> (ø)`
... and 43 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e974ce...0d2fb5e. Read the comment docs.

Signed-off-by: Annanay <annanayagarwal@gmail.com>

annanay25 · 2020-01-31T07:31:35Z

I've added the "search for presence of a tag" functionality to the cassandra reader as well, but I can submit that in a different PR

pavolloffay · 2020-01-31T10:28:39Z

search for presence of a tag

Any ideas how we could support this in UI?

Could you please add integration tests for regex and also escaped version e.g. find exactly hello*.

pavolloffay · 2020-01-31T10:31:47Z

It would be better to split the PR for regex and missing tags to two separate PRs.

annanay25 · 2020-02-03T11:45:51Z

@pavolloffay - Please review.

This reverts commit f94c220. Signed-off-by: Annanay <annanayagarwal@gmail.com>

Signed-off-by: Annanay <annanayagarwal@gmail.com>

yurishkuro · 2020-02-03T15:03:00Z

What are the performance implications of always forcing a regex search?

annanay25 · 2020-02-04T12:52:54Z

@yurishkuro From what I understand, Elasticsearch performance degrades with decrease in prefix length before the wildcard(operator) in the regex query. For ex: http.url=*.example.com will have worse performance as compared to http.url=example.c*

I could not find performance benchmarks for a Match query vs Regexp query for when no operator (*,?,etc) is used.

pavolloffay · 2020-02-04T15:46:04Z

I am also interested in performance results ad tag queries are common. Could you please run some perf tests to measure the query time?

I have used this https://github.com/pavolloffay/jaeger-perf-tests before and also @jkandasa should have some perf framework, although I am not sure if that can be used without our infrastructure.

pavolloffay · 2020-02-04T15:46:57Z

If there is perf degradation maybe we could use regex type only if there is regex in the query.

jkandasa · 2020-02-04T23:27:37Z

@annanay25 we have a tool to measure query timings. https://github.com/jkandasa/jaegerperf

annanay25 · 2020-02-05T11:47:19Z

@pavolloffay - Thanks. Is there a guide on which metrics need to be monitored after running the perf-tests?

@jkandasa - Could you please provide some more information on the tool? Does it perf-test the collector/query component, etc? The README doesn't say much.

annanay25 · 2020-02-05T11:50:29Z

@pavolloffay - Also, in the integration test, the "Tag Name + Operation Name" query seems to match the tags_regex_trace.json as well, even though they search on a different ServiceName and tag value.

I'm not sure I understand why that is happening, could you please take a look?

pavolloffay · 2020-02-05T12:23:35Z

plugin/storage/integration/fixtures/queries.json

+    "ExpectedFixtures": ["tags_regex_trace"]
+  },
+  {
+    "Caption": "Tag regex + Operation name + max Duration - Multiple traces",


@pavolloffay - Also, in the integration test, the "Tag Name + Operation Name" query seems to match the tags_regex_trace.json as well, even though they search on a different ServiceName and tag value.

@annanay25 do you mean this one?

No, the one above, "Tag regex + Operation name + max Duration"

It is supposed to match with just the tags_regex_trace but matches with both tags_regex_trace and tags_opname_trace .

This one also looks wrong as it uses different service name https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/integration/fixtures/traces/tags_opname_trace.json

Could you please debug it and or open an issue?

How did you fix the issue?

I had to mangle the traceID further -
0955efa#diff-1824d3ce6ce534210f34269d2370e966

Signed-off-by: Annanay <annanayagarwal@gmail.com>

jkandasa · 2020-02-06T09:21:50Z

@jkandasa - Could you please provide some more information on the tool? Does it perf-test the collector/query component, etc? The README doesn't say much.

@annanay25 This tool can send spans to the collector component. and can be used to measure jaeger-query API execution time. Both actions are independent.
I have updated the README file. Let me know still it is not clear.

annanay25 · 2020-02-12T10:50:42Z

@jkandasa - I was able to setup your tool to do the performance tests. I loved the simplicity, and the concept of creating jobs. I have a few suggestions from my experience of using it -

It might be helpful to mention somewhere in the README that the QueryRunner will generate a JSON with results once it completes. It was not obvious to me, at first
It would also be useful to have some guidelines on how two generated-jsons can be compared to get an idea of performance
In my latest run I get a constant contentLength of 56 in all summary results, seems fishy, is there an explanation?
In the Jobs UI page, clicking on the "delete job" icon shows "Status: Completed" even for a running job.
I'm also attaching the kubernetes yaml I wrote to set this up - please feel free to include it in the repo if you wish as it might be useful to others.

annanay25 · 2020-02-14T09:26:09Z

Perf results:

From master build -

{
    "summary": [
     {
      "contentLength": 242,
      "elapsed": 4388,
      "errorPercentage": 100,
      "errorsCount": 10,
      "name": "3.last 2 days limit 1000",
      "samples": 10
     },
     {
      "contentLength": 304,
      "elapsed": 5092,
      "errorPercentage": 100,
      "errorsCount": 10,
      "name": "4.last 7 days limit 2000",
      "samples": 10
     },
     {
      "contentLength": 282153,
      "elapsed": 203237,
      "errorPercentage": 0,
      "errorsCount": 0,
      "name": "1.last 12 hours limit 100",
      "samples": 10
     },
     {
      "contentLength": 2923946,
      "elapsed": 2138975,
      "errorPercentage": 0,
      "errorsCount": 0,
      "name": "1.last 12 hours limit 1000",
      "samples": 10
     },
     {
      "contentLength": 2970425,
      "elapsed": 21746050,
      "errorPercentage": 50,
      "errorsCount": 5,
      "name": "1.last 12 hours limit 2000",
      "samples": 10
     },
     {
      "contentLength": 229,
      "elapsed": 93426,
      "errorPercentage": 100,
      "errorsCount": 10,
      "name": "2.last 24 hours limit 100",
      "samples": 10
     },
     {
      "contentLength": 229,
      "elapsed": 81108,
      "errorPercentage": 100,
      "errorsCount": 10,
      "name": "2.last 24 hours limit 1000",
      "samples": 10
     },
     {
      "contentLength": 229,
      "elapsed": 7399,
      "errorPercentage": 100,
      "errorsCount": 10,
      "name": "2.last 24 hours limit 2000",
      "samples": 10
     },
     {
      "contentLength": 242,
      "elapsed": 4355,
      "errorPercentage": 100,
      "errorsCount": 10,
      "name": "3.last 2 days limit 100",
      "samples": 10
     },
     {
      "contentLength": 242,
      "elapsed": 4051,
      "errorPercentage": 100,
      "errorsCount": 10,
      "name": "3.last 2 days limit 2000",
      "samples": 10
     },
     {
      "contentLength": 304,
      "elapsed": 4632,
      "errorPercentage": 100,
      "errorsCount": 10,
      "name": "4.last 7 days limit 100",
      "samples": 10
     },
     {
      "contentLength": 304,
      "elapsed": 4352,
      "errorPercentage": 100,
      "errorsCount": 10,
      "name": "4.last 7 days limit 1000",
      "samples": 10
     }
    ]
   }

With this PR -

{
    "summary": [
     {
      "contentLength": 3546187,
      "elapsed": 2750042,
      "errorPercentage": 0,
      "errorsCount": 0,
      "name": "3.last 2 days limit 2000",
      "samples": 10
     },
     {
      "contentLength": 232174,
      "elapsed": 997305,
      "errorPercentage": 0,
      "errorsCount": 0,
      "name": "1.last 12 hours limit 100",
      "samples": 10
     },
     {
      "contentLength": 2609034,
      "elapsed": 10571863,
      "errorPercentage": 0,
      "errorsCount": 0,
      "name": "1.last 12 hours limit 1000",
      "samples": 10
     },
     {
      "contentLength": 3546187,
      "elapsed": 12699653,
      "errorPercentage": 0,
      "errorsCount": 0,
      "name": "1.last 12 hours limit 2000",
      "samples": 10
     },
     {
      "contentLength": 2609034,
      "elapsed": 2888810,
      "errorPercentage": 0,
      "errorsCount": 0,
      "name": "2.last 24 hours limit 1000",
      "samples": 10
     },
     {
      "contentLength": 3546187,
      "elapsed": 4955535,
      "errorPercentage": 0,
      "errorsCount": 0,
      "name": "2.last 24 hours limit 2000",
      "samples": 10
     },
     {
      "contentLength": 230222,
      "elapsed": 571004,
      "errorPercentage": 0,
      "errorsCount": 0,
      "name": "3.last 2 days limit 100",
      "samples": 10
     },
     {
      "contentLength": 2609034,
      "elapsed": 1898116,
      "errorPercentage": 0,
      "errorsCount": 0,
      "name": "3.last 2 days limit 1000",
      "samples": 10
     },
     {
      "contentLength": 230222,
      "elapsed": 379544,
      "errorPercentage": 0,
      "errorsCount": 0,
      "name": "4.last 7 days limit 100",
      "samples": 10
     },
     {
      "contentLength": 2609034,
      "elapsed": 1753636,
      "errorPercentage": 0,
      "errorsCount": 0,
      "name": "4.last 7 days limit 1000",
      "samples": 10
     },
     {
      "contentLength": 3546187,
      "elapsed": 2979869,
      "errorPercentage": 0,
      "errorsCount": 0,
      "name": "4.last 7 days limit 2000",
      "samples": 10
     },
     {
      "contentLength": 230222,
      "elapsed": 1225345,
      "errorPercentage": 0,
      "errorsCount": 0,
      "name": "2.last 24 hours limit 100",
      "samples": 10
     }
    ]
   }

Bunch of the test results from the master build give an errorPercentage: 100

jkandasa · 2020-02-14T11:16:01Z

@annanay25 Thank you for the comments and feedback on the tool. I have added a metrics page to compare the results
.

It might be helpful to mention somewhere in the README that the QueryRunner will generate a JSON with results once it completes. It was not obvious to me, at first

Updated.

It would also be useful to have some guidelines on how two generated-jsons can be compared to get an idea of performance

Metrics page includeded

In my latest run I get a constant contentLength of 56 in all summary results, seems fishy, is there an explanation?

In this case, you may see errors or no data for the specified timerange. Please refer JSON result. metrics >> raw
Example:

{
       "contentLength": 194,
       "elapsed": 15166842,
       "errorMessage": "",
       "others": {
        "errors": [
         {
          "code": 500,
          "msg": "Get https://elasticsearch:9200/_msearch: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
}

In the Jobs UI page, clicking on the "delete job" icon shows "Status: Completed" even for a running job.

Fixed

I'm also attaching the kubernetes yaml I wrote to set this up - please feel free to include it in the repo if you wish as it might be useful to others.

Thank you. I have modified and added in the source code.

annanay25 · 2020-02-14T17:52:41Z

Seems like there is some performance degradation, how do you suggest this can be improved @pavolloffay?

pavolloffay · 2020-02-17T08:43:33Z

@annanay25 could you please create a summary of the test results? For instance what types of queries take how much time in % against master/head #1969 (comment)

annanay25 · 2020-02-18T10:19:32Z

@pavolloffay - https://docs.google.com/spreadsheets/d/1mzJNQYv1xF0fiMAcpXk5TlNVE3GIqZOlZv2iDZqkraI/edit?usp=sharing

On my system, after one run the perf tests start giving a 100% error for most queries in the master build (1.16) - can someone else test this?

pavolloffay · 2020-02-19T08:22:07Z

@annanay25 the results do not look good, the performance degradation is quite significant.

I would suggest detecting the regex and changing the query type on the fly only when it is necessary.

annanay25 · 2020-02-19T08:42:53Z

@pavolloffay - Tests on my machine are quite inconsistent (PTAL at round 2, numbers are in complete contrast with round 1)

Can someone else run a benchmark on their machines? I can publish my image on dockerhub for this

pavolloffay · 2020-02-19T08:58:45Z

@annanay25 could you please build the query and collector image and publish it? @jkandasa would you be able to run the same tests as in #1969 ?

jkandasa · 2020-02-19T09:49:01Z

@pavolloffay ok I will do.
@annanay25 can you publish the collector and query image?

annanay25 · 2020-02-19T11:24:00Z

Thanks @jkandasa.
For collector the jaegertracing/jaeger-collector image can be used, as there are no changes for the collector in this PR.
For query, I have pushed the image at https://hub.docker.com/r/annanay25/jaeger-query

smpavlenko · 2020-02-28T13:54:40Z

Hello,
is there any ETA when it will be released?

annanay25 · 2020-02-29T08:17:57Z

@jkandasa - Did you get a chance to take a look at this?

jkandasa · 2020-03-02T12:56:46Z

@annanay25 here is the result:

Detailed file: https://docs.google.com/spreadsheets/d/1fIOYd4LJHHcmOIlEYuR9TQlAv4SXWdddy22zPmymvvw/edit?usp=sharing

Image Details:

1.17.0

Collecotor: jaegertracing/jaeger-collector:1.17.0
Query: jaegertracing/jaeger-query:1.17.0

pr Support regex tags search for Elasticseach backend #2049

Collector: jaegertracing/jaeger-collector:1.17.0
Query: annanay25/jaeger-query:latest (docker.io/annanay25/jaeger-query@sha256:3d700db8fb31b43cced6ecbfb7c17452165fa3def0525b9e7b366b90de9f1f6d)

OCP version

Server Version: 4.3.1
Kubernetes Version: v1.16.2

Test data:

target: "collector"
endpoint: http://jaegerqe-collector:14268/api/traces
serviceName: jaegerperf_generator
mode: history
numberOfDays: 10
spansPerDay: 60000
spansPerSecond: 300 # maximum spans limit/sec
childDepth: 2
tags: 
  spans_generator: "jaegerperf"
  days: 10
startTime:

Query test raw data:

query_test_raw_data.zip

annanay25 · 2020-03-04T07:27:04Z

Thank you for the results @jkandasa.

@pavolloffay - Could you please review these?

pavolloffay

It looks good, just minor comments in tests.

@jkandasa thanks for running the tests!

plugin/storage/integration/elasticsearch_test.go

plugin/storage/integration/fixtures/queries_es.json

pavolloffay · 2020-03-04T13:31:58Z

@annanay25 could you please provide more description in the first comment? E.g. what fields could be searched with regex. Is it only values or the keys?

Does it work with our UI?

Signed-off-by: Annanay <annanayagarwal@gmail.com>

annanay25 · 2020-03-04T14:30:57Z

Thanks for the review, @pavolloffay. I've addressed all comments. PTAL.

pavolloffay · 2020-03-04T14:38:14Z

should match all tags like

what about logs?

plugin/storage/integration/elasticsearch_test.go

Signed-off-by: Annanay <annanayagarwal@gmail.com>

naeimehmhm · 2020-05-13T13:52:10Z

Is there any update on having regex in tag search?

Add regex tag filter for ES backend

ca65f26

Signed-off-by: Annanay <annanayagarwal@gmail.com>

annanay25 requested a review from a team as a code owner January 30, 2020 06:41

annanay25 requested a review from jpkrohling January 30, 2020 06:41

[ES] Add support to check for presence of a tag

f94c220

Signed-off-by: Annanay <annanayagarwal@gmail.com>

annanay25 changed the title ~~Add regex tag filter for ES backend~~ Improve query functionality for ES backend Jan 30, 2020

yurishkuro requested a review from pavolloffay January 30, 2020 16:31

pavolloffay added the storage/elasticsearch label Jan 31, 2020

annanay25 force-pushed the tag-regex branch from 5232c04 to 35e7f64 Compare February 3, 2020 11:46

annanay25 added 2 commits February 3, 2020 17:22

Revert "[ES] Add support to check for presence of a tag"

d9afd37

This reverts commit f94c220. Signed-off-by: Annanay <annanayagarwal@gmail.com>

Add query builder and integration tests

6f87b93

Signed-off-by: Annanay <annanayagarwal@gmail.com>

annanay25 force-pushed the tag-regex branch from 35e7f64 to 6f87b93 Compare February 3, 2020 11:52

pavolloffay reviewed Feb 5, 2020

View reviewed changes

annanay25 added 2 commits February 6, 2020 01:04

Fix tests

0955efa

Signed-off-by: Annanay <annanayagarwal@gmail.com>

Add ES specific integration tests

af7baa6

Signed-off-by: Annanay <annanayagarwal@gmail.com>

annanay25 mentioned this pull request Feb 7, 2020

[ES, C*] Check for the presence of a tag #2054

Closed

pavolloffay reviewed Mar 4, 2020

View reviewed changes

pavolloffay changed the title ~~Improve query functionality for ES backend~~ Support regex tags search for Elasticseach backend Mar 4, 2020

Improve tag search tests

039e6c9

Signed-off-by: Annanay <annanayagarwal@gmail.com>

pavolloffay reviewed Mar 4, 2020

View reviewed changes

plugin/storage/integration/elasticsearch_test.go Outdated Show resolved Hide resolved

plugin/storage/integration/elasticsearch_test.go Outdated Show resolved Hide resolved

Nit, clean function params

0d2fb5e

Signed-off-by: Annanay <annanayagarwal@gmail.com>

pavolloffay approved these changes Mar 4, 2020

View reviewed changes

pavolloffay merged commit e4a6ee6 into jaegertracing:master Mar 4, 2020

mircea-pop mentioned this pull request Mar 31, 2020

Better documentation for tag search jaegertracing/jaeger-ui#171

Open

idohalevi mentioned this pull request Oct 9, 2020

Search traces with non-string tags #2540

Closed

Support regex tags search for Elasticseach backend #2049

Support regex tags search for Elasticseach backend #2049

Conversation

annanay25 commented Jan 30, 2020 • edited Loading

Which problem is this PR solving?

Short description of the changes

codecov bot commented Jan 30, 2020 • edited Loading

Codecov Report

annanay25 commented Jan 31, 2020

pavolloffay commented Jan 31, 2020

pavolloffay commented Jan 31, 2020

annanay25 commented Feb 3, 2020

yurishkuro commented Feb 3, 2020

annanay25 commented Feb 4, 2020

pavolloffay commented Feb 4, 2020

pavolloffay commented Feb 4, 2020

jkandasa commented Feb 4, 2020

annanay25 commented Feb 5, 2020

annanay25 commented Feb 5, 2020

pavolloffay Feb 5, 2020

Choose a reason for hiding this comment

annanay25 Feb 5, 2020

Choose a reason for hiding this comment

pavolloffay Feb 5, 2020

Choose a reason for hiding this comment

pavolloffay Feb 7, 2020

Choose a reason for hiding this comment

annanay25 Feb 7, 2020

Choose a reason for hiding this comment

jkandasa commented Feb 6, 2020 • edited Loading

annanay25 commented Feb 12, 2020 • edited Loading

annanay25 commented Feb 14, 2020

jkandasa commented Feb 14, 2020

annanay25 commented Feb 14, 2020

pavolloffay commented Feb 17, 2020

annanay25 commented Feb 18, 2020

pavolloffay commented Feb 19, 2020

annanay25 commented Feb 19, 2020

pavolloffay commented Feb 19, 2020

jkandasa commented Feb 19, 2020

annanay25 commented Feb 19, 2020

smpavlenko commented Feb 28, 2020

annanay25 commented Feb 29, 2020

jkandasa commented Mar 2, 2020

Image Details:

Query test raw data:

annanay25 commented Mar 4, 2020

pavolloffay left a comment

Choose a reason for hiding this comment

pavolloffay commented Mar 4, 2020 • edited Loading

annanay25 commented Mar 4, 2020

pavolloffay commented Mar 4, 2020

naeimehmhm commented May 13, 2020

annanay25 commented Jan 30, 2020 •

edited

Loading

codecov bot commented Jan 30, 2020 •

edited

Loading

jkandasa commented Feb 6, 2020 •

edited

Loading

annanay25 commented Feb 12, 2020 •

edited

Loading

pavolloffay commented Mar 4, 2020 •

edited

Loading