Add design doc for processing query language. #4444

anuraaga · 2021-11-17T02:39:59Z

This captures ongoing discussions about a revamp of the processing pipeline.

anuraaga · 2021-11-17T02:41:10Z

For a reminder, notes from an initial review - I meant to mostly capture them but if missed anything feel free to point out!

type = “span” -> from


calling out labels to understand output more clearly

no type = resource

ilm

angularjs scope


basic queries hundreds of nanoseconds

transform

processing

point out it’s being implemented in contrib

delta / cumulative - processing

processing can be embedded

can have field name conflicts with OTLP

always prefix with scope

e.g., type = span and type = metric

pod.cpu.usage - multiple within same metric

drop attribute, then aggregate

collection of aligned points

array-like field

type

exists attribute

dan observiq logs + stanza

order of execution

losing optimizations because of side effects

FROM resource prevents the resource-splitting case

Clarify that operations are not independent

Keep cross-signal for farther down the line

keep as parameter to create_histogram

batchbyresource

codecov · 2021-11-17T02:42:46Z

Codecov Report

Merging #4444 (9d42af4) into main (db4aa87) will increase coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #4444      +/-   ##
==========================================
+ Coverage   90.69%   90.72%   +0.03%     
==========================================
  Files         178      179       +1     
  Lines       10356    10690     +334     
==========================================
+ Hits         9392     9699     +307     
- Misses        746      770      +24     
- Partials      218      221       +3

Impacted Files	Coverage Δ
exporter/loggingexporter/known_sync_error.go	`0.00% <0.00%> (-80.00%)`	⬇️
service/command.go	`82.35% <0.00%> (-17.65%)`	⬇️
config/configtelemetry/configtelemetry.go	`89.74% <0.00%> (-10.26%)`	⬇️
cmd/builder/internal/builder/main.go	`59.74% <0.00%> (-7.29%)`	⬇️
service/flags.go	`86.36% <0.00%> (-6.82%)`	⬇️
exporter/loggingexporter/logging_exporter.go	`88.04% <0.00%> (-5.44%)`	⬇️
internal/cgroups/cgroups.go	`87.67% <0.00%> (-4.00%)`	⬇️
processor/memorylimiterprocessor/memorylimiter.go	`85.32% <0.00%> (-3.27%)`	⬇️
service/internal/telemetrylogs/logger.go	`85.71% <0.00%> (-1.79%)`	⬇️
config/configgrpc/configgrpc.go	`92.57% <0.00%> (-1.35%)`	⬇️
... and 66 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db4aa87...9d42af4. Read the comment docs.

alolita · 2021-11-17T03:48:30Z

Thanks @anuraaga

Adding to Collector SIG meeting agenda for discussion.

bogdandrutu

After reading again the examples I fear that we try too much to reduce the number of statements/words and because of that we need to make some "assumptions"/"tradeoffs". I would suggest to avoid any kind of optimizations like that in the first version and ask users to always fully specify (sometimes duplicate) the statements. Then we can start adding "optimizations" like if you don't have a from applies to all etc.

bogdandrutu · 2021-11-18T17:55:07Z

docs/processing.md

+mechanism could be used for selecting metrics for temporality conversion as other cases, but it is expected that in
+practice configuration will be limited.
+
+The processors implementing this use case are `cumulativetodeltaprocessor` and `deltatorateprocessor`.


deltatorateprocessor is not temporality related :D the name is a bit wrong to include "delta", it is just a rate calculation value/time_interval, this is more meaningful for delta I agree, but the operation can be done on cumulative as well.

docs/processing.md

bogdandrutu · 2021-11-18T18:01:19Z

docs/processing.md

+Remove a forbidden attribute such as `http.request.header.authorization` from all telemetry.
+
+`delete(attributes["http.request.header.authorization"])`


I think this is nice, but a bit confusing. Does this remove the attribute from the "span.events" as well? What about the "resource.attributes"?

For now, as on line 119 I defined all field accesses as fully specified, so it is not ambiguous I think. Let me know if I should phrase it differently

bogdandrutu · 2021-11-18T18:02:21Z

docs/processing.md

+
+Remove all attributes except for some
+
+`keep(attributes, "http.method", "http.status_code") from metric`


We call the "signals" as [traces/metrics/logs] may want to be consistent here and use plural at least, and probably use "traces" instead of "spans"?

github-actions · 2021-11-26T03:15:57Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

anuraaga · 2021-11-29T06:20:59Z

Thanks @bogdandrutu - sorry for the delay, I descoped things like "from all telemetry" since as you mentioned before, the collector has no support for this at all anyways. And the paths are absolute paths as defined here, hopefully that's the reduced scope you're talking about.

Also have started on a very hacky prototype of the AST processing

anuraaga/opentelemetry-collector-contrib#1

If anyone has any suggestions on test cases to solve / benchmark in the first versions, I can list them up in the doc

tigrannajaryan · 2021-12-01T14:53:20Z

@anuraaga this is very interesting. I have been thinking about a query language in the background for a while, would be good to discuss this. Before you go ahead with the current design let's discuss the alternates and possible amendments. A couple thing I would be interested to explore:

SQL or No

Is SQL-like language preferable or we can go with imperative style (e.g. more Python-like)? It doesn't seem like we are going to support complicated SQL-like facilities (e.g. joins) and SQL's verbosity seems superfluous.

Compare SQL-like vs Python-like below.

Example 1:
delete(attributes["http.request.header.authorization"]) from traces where name = "GET /cats"
with:
if name = "GET /cats": delete(attributes["http.request.header.authorization"])

Example 2:
set(attributes["k8s_pod"], resource.attributes["k8s.pod.name"]) from metrics
with:
attributes["k8s_pod"] = resource.attributes["k8s.pod.name"]

The from clause seems unnecessary since the input to the processor is a known data type.

Special syntax/lexical rules

Can we define the syntax such that it makes usage with Otel data specifically convenient? You mentioned a shorthand for accessing scoped data. This is the right idea and I think we can make spill/populate all "attributes" as top-level variables. Maybe we can also combine this will tailored lexical rules for identifiers that allows dot as a valid identifier character and then we can do this instead of the above:

Example 1:
if name = "GET /cats": delete(http.request.header.authorization)
Example 2:
k8s_pod = resource.attributes["k8s.pod.name"]

The idea being that the most common cases are much more concise and pleasant to write. The downside is you loose the dot as an operator, but maybe we don't need it and brackets are enough for traversal of objects.

tigrannajaryan · 2021-12-01T15:05:38Z

docs/processing.md

+create_histogram("duration", end_time_nanos - start_time_nanos) from traces
+keep(attributes, "http.method") from metrics where descriptor.metric_name = "duration"


Can you clarify what this does? Does this read a span and produce a metric? How is this possible with current processors interface where the processor's input/output data type is the same?

Sorry I had meant to remove these cases but forgot this one. When originally writing the doc I forgot about the one signal-per-pipeline limitation.

In the future, I think shared pipelines could be cool but not for now, especially because collector core has no way of supporting that

tigrannajaryan · 2021-12-01T15:11:16Z

docs/processing.md

+Create utilization metric from base metrics. Because navigation expressions only operate on a single piece of telemetry,
+helper functions for reading values from other metrics need to be provided.
+
+`create_gauge("pod.cpu.utilized", read_gauge("pod.cpu.usage") / read_gauge("node.cpu.limit") from metric`**s**


To make this possible the processor will need to store metric data points referenced by read_gauge, right? Would that be the responsibility of read_gauge implementation?

Yeah that's the idea, though need to confirm better with code indeed

This builds on top of open-telemetry#4444 but uses Python-like syntax instead of SQL-like.

tigrannajaryan · 2021-12-01T15:44:29Z

For easier comparison I posted the alternate with Python-like syntax here: #4499

anuraaga

Thanks for the comments @tigrannajaryan!

One of the main motivations of SQL-like was to take advantage of existing tooling or muscle memory. SQL itself has a grammar that is pretty general, I think the proposed syntax is actually a true subset of it. We probably don't want to provide a real programming language in these statements (or at least not yet :) ), so worried if the Python-derived syntax may end up too flexible. We also need to make sure to keep open query engine optimizations, e.g. combining transformations with the same where clause. Of course, if it's just a difference in formatting, then if: and where could be implemented similarly, but the former seems like it could open a larger can of worms.

The from clause mostly comes from my original idea of having a pipeline with all the signals, forgetting this isn't possible. However, I think having it even from the beginning may be worth it to allow for this in the future, and even in an initial rendition, it seems like UX can be simpler by only configuring the processor once and applying it to each signal pipeline - the processor would smartly only use the rules that actually make sense.

Let me know what you think of these thoughts. /cc @punya also who has been a proponent of SQL-like syntax I believe

anuraaga · 2021-12-02T07:02:25Z

docs/processing.md

+create_histogram("duration", end_time_nanos - start_time_nanos) from traces
+keep(attributes, "http.method") from metrics where descriptor.metric_name = "duration"


Sorry I had meant to remove these cases but forgot this one. When originally writing the doc I forgot about the one signal-per-pipeline limitation.

In the future, I think shared pipelines could be cool but not for now, especially because collector core has no way of supporting that

anuraaga · 2021-12-02T07:03:13Z

docs/processing.md

+Create utilization metric from base metrics. Because navigation expressions only operate on a single piece of telemetry,
+helper functions for reading values from other metrics need to be provided.
+
+`create_gauge("pod.cpu.utilized", read_gauge("pod.cpu.usage") / read_gauge("node.cpu.limit") from metric`**s**


Yeah that's the idea, though need to confirm better with code indeed

tigrannajaryan · 2021-12-02T17:24:09Z

@anuraaga I am not opposed to SQL-like but I think it may be worth to spend a bit more time with exploring what can give us best usability. I am not completely sure SQL is the most familiar language for our target audience.

What do you think if we put together a side-by-side comparison of the examples that you have, plus some other interesting use-cases and see what it looks like? I had also looked into SPL (Splunk's query language) as a source of inspiration, so it may be useful to see if we can borrow any useful ideas from there.

If we are totally brave it may even make sense to compare 3 possibilities side-by-side: SQL-like, Python-like and homegrown.

I am happy to help/work on this together if you want.

anuraaga · 2021-12-03T07:48:39Z

@tigrannajaryan That sounds like a good idea, let me try to merge the snippets from your PR to let us see things side-by-side

anuraaga · 2021-12-09T06:00:47Z

@tigrannajaryan Sorry for the delay I added snippets following your form as well how does it look? Also feel free to make edits directly to the branch if any easy improvements

tigrannajaryan · 2021-12-09T15:03:38Z

@tigrannajaryan Sorry for the delay I added snippets following your form as well how does it look? Also feel free to make edits directly to the branch if any easy improvements

Thanks @anuraaga . I think we need to compare and see what we gain by using one or the other approach.

Some arguments in favour of SQL-like:

Because you can have a from clause you can use the same expression in a processor attached to pipelines of multiple data types. I am not entirely sure this is a benefit though. I could argue the other way and say that if you attach to a pipeline of a wrong type it may be confusing why the expression has no effect.

Arguments in favour of Python-like:

Slightly more concise (less characters).

To comments on some arguments you brought up earlier:

One of the main motivations of SQL-like was to take advantage of existing tooling or muscle memory.

I think it is arguable whether SQL or Python has more tooling or muscle memory. Besides it is not clear what tooling we can reuse. Editors with syntax highlighting perhaps, provided that the expression is in a separate file (not possible today but maybe in the future)?

We probably don't want to provide a real programming language in these statements (or at least not yet :) ), so worried if the Python-derived syntax may end up too flexible.

I agree that we need to be careful with this. However given that both with SQL-like and Python-like we are only providing a severely limited subset of the language I think we are going to be exposing roughly the same level of complexity, so I think it is a wash here.

We also need to make sure to keep open query engine optimizations, e.g. combining transformations with the same where clause.

I would like to explore this more. Do we believe that SQL syntax is inherently easier to optimize for our use cases? I am not sure why is it so. The optimizations SQL engines do are typically all about execution plans. However, in our case we have a stream of data, not data at rest, which inherently limits what sort variations you may have in the "execution plan". I doubt that a lot of optimizations are possible, in the majority of case we are going to be literally executing the statements as is, while iterating over the data records. I would like to see a specific example where we think a particular SQL syntax is easier to optimize that the equivalent Python syntax. I am having a bit of a hard time coming up with an example myself.

This is probably not very helpful. I feel like there is not enough arguments in favour or against one or the other approach yet. Let me think a bit to see if I can come up with some additional arguments and please feel free to add some yourself.

Some other topics to try to explore:

Is there perhaps an extended set of operations which we will want in the future and which are much better/easier to express in one syntax but not the other?
How does end-user troubleshooting compare for the 2 syntaxes? Can we emit much better error message for one syntax but not the other?
Can we provide good tooling in the future? We could potentially implement Language Server Protocol and Debug Adapter Protocol. Is one syntax or the other more favourable from this perspective?

anuraaga · 2021-12-10T07:29:42Z

Thanks @tigrannajaryan - I agree that the differences aren't that large and there isn't any clear winner.

I could argue the other way and say that if you attach to a pipeline of a wrong type it may be confusing why the expression has no effect.

I guess I am thinking in terms of forwards compatibility, which may be too much early optimization anyways. I do expect a way to process multiple signals at the same time to be useful in the future, and it would be nice if the syntax supports it. The syntax could change to support it in the future though.

I am having a bit of a hard time coming up with an example myself.

It's true that this probably ties into the previous section, if the language exposes too much complexity then it would become harder to optimize. But if the complexity is kept down for even a python-like language, then the optimization possibilities should be similar. They would both probably be frontends into the same AST backend that actually runs the commands.

I guess from reading them, and maybe just personal preference, the SQL form seems to map more precisely to single statements. if foo: do feels like that do should be able to be as many statements as needed, on the flip side I think supporting multiple commands for a single where would look awkward I think. Whether multiple statements for a condition should be supported in the language or not could be a decision point - the simplicity of single statements appeals to me in some sense, but in practice there may be too many situations where multiple commands are useful that they should be modeled into the language.

tigrannajaryan · 2021-12-10T17:07:31Z

I guess from reading them, and maybe just personal preference, the SQL form seems to map more precisely to single statements. if foo: do feels like that do should be able to be as many statements as needed, on the flip side I think supporting multiple commands for a single where would look awkward I think. Whether multiple statements for a condition should be supported in the language or not could be a decision point - the simplicity of single statements appeals to me in some sense, but in practice there may be too many situations where multiple commands are useful that they should be modeled into the language.

You are right, potentially with python like syntax we can allow:

if condition:
  statement1
  statement2
  ....

tigrannajaryan · 2021-12-10T17:08:18Z

I guess I am thinking in terms of forwards compatibility, which may be too much early optimization anyways. I do expect a way to process multiple signals at the same time to be useful in the future, and it would be nice if the syntax supports it. The syntax could change to support it in the future though.

@anuraaga Can you perhaps show and example of what you mean by this? I feel that I may be missing some nice way to support multiple signals that you are envisioning.

anuraaga · 2021-12-13T08:05:19Z

Can you perhaps show and example of what you mean by this? I feel that I may be missing some nice way to support multiple signals that you are envisioning.

@tigrannajaryan "A way to support multiple signals"? Nah I don't have that :P Would be a large rewrite in the pipeline I think. I do have a desire though. I think there are many cross-signal use cases like "Drop all health check telemetry", "Redact auth header on all telemetry", "Reduce cardinality for all telemetry of certain attribute in same way". So I am hoping that at least a future configuration language allows this.

Ah, if you meant syntax-wise, then I was thinking

delete(attributes["http.request.header.authorization"]) from traces where name = "GET /cats"
delete(attributes["http.request.header.authorization"]) where name = "GET /cats"

Former is only traces, latter is for all telemetry. Explicitly requiring the signal name in the statement means in the later it could be omitted when applying to all telemetry. Even now it could be, the same instantiated transform processor could be applied to all signals.

tigrannajaryan · 2021-12-14T16:02:26Z

Former is only traces, latter is for all telemetry. Explicitly requiring the signal name in the statement means in the later it could be omitted when applying to all telemetry. Even now it could be, the same instantiated transform processor could be applied to all signals.

I see. You can achieve it today by either including the processor in a particular pipeline or not. This does not seem to enable any new interesting functionality that is not possible with the current philosophy of using a processor in one or more pipelines as applicable.

I was wondering if it could somehow enable more interesting scenarios where signal type is getting converted in the middle of the pipeline and somehow one pipeline temporarily happens to hold more than one signal type and on which you execute statements selectively using the "from" statement. This would essentially mean a pipeline can contain a data of multiple signals. I can see how in that case the "from" statement would be really necessary. However, I am having a hard time imagining how exactly we could change the Collector's pipelines to work like that.

Anyway, I was just trying to find stronger arguments in favour of having the "from" clause, but I do not seem to be able :-)

alolita · 2021-12-16T03:30:02Z

@punya can you review the list of examples and add any missing ones.

tigrannajaryan · 2021-12-16T13:58:31Z

docs/processing.md

+// Assuming this is not in "core"
+processors.register("replace_wildcards", replace_wildcards)
+
+func replace_wildcards(pattern regexp.Regexp, replacement string, path processors.TelemetryPath) processors.Result  {


How does the string in the call get converted to a regexp.Regexp here? The example above shows:
replace_wildcards("/user/*/list/*", "/user/{userId}/list/{listId}", attributes["http.target"])
Is reflection used for every invocation of of replace_wildcards to figure out that "/user/*/list/*" must be compiled into regexp.Regexp? Or is it the responsibility of the compiler to figure this out?

Related to this: do we adopt weak typing, when you can pass a string when regexp is expected? Weak typing may be more complicated to optimize (i.e. precompile regexp patterns once) so it may be worth thinking about this aspect of the language.

Do we intend to make the compiler smart enough to figure out that the regexp compilation can be done ahead of time and not for every invocation for every span? That's a fairly big ask for a simpler compiler, but without that the execution can be quite slow.

Great callout - while writing the POC I realized the functions need to be factories of the actual logic function.

(a bit of shorthand)

func replace_wildcards(string, string) func(Span, Resource) { r := regex.MustCompile return func(span, resource) { return doReplace(r, s, span, resource) } } func doReplace(regex, string, span, resource) { go()! }

The UX for defining a function is a bit reduced but seems quite reasonable. And with this it will be possible to use reflection to convert types passed into the factory so instread of string, it could be defined to accept regex. The framework would only need to reflect, convert regex, and invoke factory, once during config parse time.

I'll add a note in the doc about this.

BTW, is it the same smartness you're referring to? 😅 I think it achieves some balance by creating the factory / logic function split to make the optimization easier without making the compiler have to be too complex.

The factory approach requires that some arguments are compile time constants. We loose the ability to use a value computed at runtime as the regular expression, for example. It also requires us to declare which of the function parameters are compile-time parameters and which are runtime parameters. In this example the fist 2 arguments of replace_wildcards are compile-time constants.

What if I want to replace a span attribute by a value of another attribute and not by a constant string? It doesn't seem to be possible. To be honest, I have reservations about the approach you suggest. I think it limits expressiveness of the language. Implementation-wise this approach is deviating from how language compilers and VMs are typically implemented and it may be difficult to fix it without major rewrites in the future.

Alternative approach is to have Regexp type as a first class citizen in the language. So you can do this instead:
replace_wildcards(regexp("/user/*/list/*"), "/user/{userId}/list/{listId}", attributes["http.target"]) where regexp is a function that takes a string and returns a Regexp. Then a sufficiently smart compiler can perform the Regexp value computation at compile time and avoid calling the regexp() function at time if the argument to regexp() is a compile-time constant. See how our current expr evaluator added support for that: https://github.com/antonmedv/expr/pull/96/files

We have another unclear moment with the 3rd parameter of replace_wildcards. It is not clear how attributes["http.target"] becomes a TelemetryPath (or a Span and a Resource pair that is passed to doReplace). Is the semantic of the language that attributes["http.target"] is an lvalue of type TelemetryPath that can be either evaluated to get the value of or can be passed as a reference to a function to store a value into?

The factory approach requires that some arguments are compile time constants. We loose the ability to use a value computed at runtime as the regular expression, for example.

Could you describe an example of a regex computed at runtime? As this is a static config (well possibly dynamically updated via a remote), I figured that by nature the config elements, such as a regex, would be static.

is an lvalue of type TelemetryPath that can be either evaluated to get the value of or can be passed as a reference to a function to store a value into?

Yeah - since it's not a string, it is parsed as a telemetry path. If the function's argument in that position does not accept a telemetry path, it would be a configuration error and cause a warning. It's true that currently only telemetry paths are runtime values - I think this hits all of the use cases for telemetry transform (the only runtime data is the telemetry itself) but let me know what other ones there are that may need something more.

Currently the value type is a literal, path expression, or invocation of a function

https://github.com/anuraaga/opentelemetry-collector-contrib/pull/1/files#diff-3dcf16cd5cabbe637313a07c03600765fd4b043e429d66b9ef024f62f3adbd51R24

It would be relatively easy to expand support for special syntax, e.g. /regex/ or even regexp("foo") if it's useful I think. That being said, I don't think we'd avoid factories (though they wouldn't necessarily need to be required). We probably can't support all types of config-time resolved values. For keep, we need to convert a slice of strings into a map for quick presence tests

https://github.com/anuraaga/opentelemetry-collector-contrib/pull/1/files#diff-d38eee897f4ad52c000a3df17502753a839f5afb2f7bdf3f573807cea2588561R26

There doesn't seem to be a way to handle this generically, but the factory approach seems to work well for it.

github-actions · 2021-12-31T03:16:07Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2022-01-18T03:16:18Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

bogdandrutu

Merging this as draft. After we progress with the implementation we may update it.

tigrannajaryan · 2022-01-25T19:35:41Z

@anuraaga I found a couple interesting alternate approaches that we could probably look into:
https://github.com/max-sixty/prql
https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/samples?pivots=azuredataexplorer

I think the syntax of either of these is nicer than SQL. I like that the order of syntax elements matches the order of logical execution.

anuraaga · 2022-01-26T07:00:52Z

@tigrannajaryan Thanks a lot for the pointers! I will look through those to see what we can take out of them. Do you think we should continue to iterate on this PR or followup with another one?

tigrannajaryan · 2022-01-26T16:38:15Z

Do you think we should continue to iterate on this PR or followup with another one?

Works either way for me. Whatever you prefer.

anuraaga · 2022-01-27T06:46:22Z

This has been out for a while so I think it'll be good to get this in as a draft and iterate on this / the initial implementation together if it's ok

Add design doc for processing query language.

fc1c4c2

anuraaga requested review from a team and codeboten November 17, 2021 02:39

alolita added the discussion-needed Community discussion needed label Nov 17, 2021

bogdandrutu mentioned this pull request Nov 18, 2021

Add setStatus operation for spanprocessor open-telemetry/opentelemetry-collector-contrib#5886

Merged

bogdandrutu reviewed Nov 18, 2021

View reviewed changes

github-actions bot added the Stale label Nov 26, 2021

Target single signal / some cleanup

de83015

github-actions bot removed the Stale label Nov 30, 2021

tigrannajaryan reviewed Dec 1, 2021

View reviewed changes

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-collector that referenced this pull request Dec 1, 2021

Alternate data processing language

1b4e4f6

This builds on top of open-telemetry#4444 but uses Python-like syntax instead of SQL-like.

tigrannajaryan mentioned this pull request Dec 1, 2021

Alternate data processing language #4499

Closed

anuraaga commented Dec 2, 2021

View reviewed changes

pythonlike too

16d64ef

tigrannajaryan reviewed Dec 16, 2021

View reviewed changes

github-actions bot added the Stale label Dec 31, 2021

anuraaga mentioned this pull request Jan 3, 2022

Query language transform processor open-telemetry/opentelemetry-collector-contrib#6985

Closed

bogdandrutu removed the Stale label Jan 3, 2022

This was referenced Jan 5, 2022

[processor/transform] Add skeleton for query language transform processor open-telemetry/opentelemetry-collector-contrib#7047

Merged

processor/transform Add implementation of query processing open-telemetry/opentelemetry-collector-contrib#7129

Merged

github-actions bot added the Stale label Jan 18, 2022

aunshc mentioned this pull request Jan 19, 2022

Ability to use action:extract on metrics labels/attributes values. #4568

Closed

anuraaga mentioned this pull request Jan 21, 2022

[processor/transform] Add business logic for handling traces queries. open-telemetry/opentelemetry-collector-contrib#7300

Merged

Update processing.md

9d42af4

bogdandrutu approved these changes Jan 25, 2022

View reviewed changes

bogdandrutu enabled auto-merge (squash) January 25, 2022 19:23

bogdandrutu disabled auto-merge January 25, 2022 19:56

github-actions bot removed the Stale label Jan 26, 2022

tigrannajaryan added Skip Changelog PRs that do not require a CHANGELOG.md entry discussion-needed Community discussion needed and removed discussion-needed Community discussion needed labels Jan 27, 2022

tigrannajaryan merged commit 32301d9 into open-telemetry:main Jan 27, 2022

		Remove a forbidden attribute such as `http.request.header.authorization` from all telemetry.

		`delete(attributes["http.request.header.authorization"])`


		Remove all attributes except for some

		`keep(attributes, "http.method", "http.status_code") from metric`

		create_histogram("duration", end_time_nanos - start_time_nanos) from traces
		keep(attributes, "http.method") from metrics where descriptor.metric_name = "duration"

Add design doc for processing query language. #4444

Add design doc for processing query language. #4444

Conversation

anuraaga commented Nov 17, 2021

anuraaga commented Nov 17, 2021

codecov bot commented Nov 17, 2021 • edited Loading

Codecov Report

alolita commented Nov 17, 2021

bogdandrutu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Nov 26, 2021

anuraaga commented Nov 29, 2021 • edited Loading

tigrannajaryan commented Dec 1, 2021

SQL or No

Special syntax/lexical rules

tigrannajaryan Dec 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tigrannajaryan commented Dec 1, 2021

anuraaga left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tigrannajaryan commented Dec 2, 2021

anuraaga commented Dec 3, 2021

anuraaga commented Dec 9, 2021

tigrannajaryan commented Dec 9, 2021

anuraaga commented Dec 10, 2021

tigrannajaryan commented Dec 10, 2021

tigrannajaryan commented Dec 10, 2021

anuraaga commented Dec 13, 2021 • edited Loading

tigrannajaryan commented Dec 14, 2021

alolita commented Dec 16, 2021

tigrannajaryan Dec 16, 2021 • edited Loading

Choose a reason for hiding this comment

anuraaga Dec 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tigrannajaryan Dec 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 31, 2021

github-actions bot commented Jan 18, 2022

bogdandrutu left a comment

Choose a reason for hiding this comment

tigrannajaryan commented Jan 25, 2022

anuraaga commented Jan 26, 2022

tigrannajaryan commented Jan 26, 2022

anuraaga commented Jan 27, 2022

codecov bot commented Nov 17, 2021 •

edited

Loading

anuraaga commented Nov 29, 2021 •

edited

Loading

tigrannajaryan Dec 1, 2021 •

edited

Loading

anuraaga commented Dec 13, 2021 •

edited

Loading

tigrannajaryan Dec 16, 2021 •

edited

Loading

anuraaga Dec 16, 2021 •

edited

Loading

tigrannajaryan Dec 16, 2021 •

edited

Loading