-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for language additions to OTTL #30800
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
@jsuereth thanks for taking the time to dive into OTTL. There is a lot to comprehend here and it is going to take some time for me to ingest everything. I think this would be a very good topic for a Collector SIG meeting. |
@jsuereth since this isnt a PR yet I'm gonna supply some feedback from https://github.com/jsuereth/ottl-proposal/blob/main/PROPOSAL.md here feedback on the complaints section
I believe all boolean or mathematical expressions can have Boolean values can now be used in conditions directly: #20911. Do you have an example where this is not working? See #26108 for the cache and time values.
Currently
We've recently added a feature so that indexing fields that cannot be index throws an error on startup. feedback on the proposal section
|
For the comparison to existing OTTL syntax, I'd like to submit my attempt at parsing a JSON log to specification. I probably misinterpreted some stuff, so please let me know if I've done something wrong: Input: {
"severity":"ERROR",
"message":"There was an error in the application.",
"httpRequest":{
"requestMethod":"GET"
},
"times":"2020-10-12T07:20:50.52Z",
"logging.googleapis.com/insertId":"42",
"logging.googleapis.com/labels":{
"user_label_1":"value_1",
"user_label_2":"value_2"
},
"logging.googleapis.com/operation":{
"id":"get_data",
"producer":"github.com/MyProject/MyApplication",
"first":"true"
},
"logging.googleapis.com/sourceLocation":{
"file":"get_data.py",
"line":"142",
"function":"getData"
},
"logging.googleapis.com/spanId":"000000000000004a",
"logging.googleapis.com/trace":"projects/my-projectid/traces/06796866738c859f2f19b7cfb3214824",
"logging.googleapis.com/trace_sampled":false
} transformprocessor config: transform:
error_mode: ignore
log_statements:
- context: log
statements:
- merge_maps(cache, ParseJSON(body), "upsert") where IsMatch(body, "^\\{")
- set(time, Time(cache["times"], "%Y-%m-%dT%H:%M:%S.%f%z")) where cache["times"] != nil
- set(severity_number, SEVERITY_NUMBER_DEBUG) where cache["severity"] == "DEBUG"
- set(severity_number, SEVERITY_NUMBER_INFO) where cache["severity"] == "INFO"
- set(severity_number, SEVERITY_NUMBER_WARN) where cache["severity"] == "WARNING"
- set(severity_number, SEVERITY_NUMBER_ERROR) where cache["severity"] == "ERROR"
- set(severity_number, SEVERITY_NUMBER_FATAL) where cache["severity"] == "CRITICAL"
- set(attributes["gcp.http_request"], cache["httpRequest"]) where cache["http_request"] != nil
- set(span_id.string, cache["logging.googleapis.com/spanId"]) where cache["logging.googleapis.com/spanId"] != nil
- replace_pattern(cache["logging.googleapis.com/trace"], "projects/.*/traces/([\\w\\d]+)", "$$1") where cache["logging.googleapis.com/trace"] != nil
- set(trace_id.string, cache["logging.googleapis.com/trace"]) where cache["logging.googleapis.com/trace"] != nil
- delete_key(cache, "times")
- delete_key(cache, "severity")
- delete_key(cache, "httpRequest")
- delete_key(cache, "logging.googleapis.com/trace")
- delete_key(cache, "logging.googleapis.com/spanId")
- flatten(cache, prefix="gcp")
- merge_maps(attributes, cache, "insert") output: Resource SchemaURL:
ScopeLogs #0
ScopeLogs SchemaURL:
InstrumentationScope
LogRecord #0
ObservedTimestamp: 2024-01-26 23:21:16.374373 +0000 UTC
Timestamp: 2020-10-12 07:20:50.52 +0000 UTC
SeverityText:
SeverityNumber: Error(17)
Body: Str({"severity":"ERROR","message":"There was an error in the application.","httpRequest":{ "requestMethod":"GET"},"times":"2020-10-12T07:20:50.52Z","logging.googleapis.com/insertId":"42","logging.googleapis.com/labels":{ "user_label_1":"value_1", "user_label_2":"value_2"},"logging.googleapis.com/operation":{ "id":"get_data", "producer":"github.com/MyProject/MyApplication", "first":"true"},"logging.googleapis.com/sourceLocation":{ "file":"get_data.py", "line":"142", "function":"getData"},"logging.googleapis.com/spanId":"000000000000004a","logging.googleapis.com/trace":"projects/my-projectid/traces/06796866738c859f2f19b7cfb3214824","logging.googleapis.com/trace_sampled":false})
Attributes:
-> logging.googleapis.com/spanId: Str(000000000000004a)
-> times: Str(2020-10-12T07:20:50.52Z)
-> logging.googleapis.com/sourceLocation: Map({"file":"get_data.py","function":"getData","line":"142"})
-> httpRequest: Map({"requestMethod":"GET"})
-> logging.googleapis.com/insertId: Str(42)
-> logging.googleapis.com/labels: Map({"user_label_1":"value_1","user_label_2":"value_2"})
-> message: Str(There was an error in the application.)
-> logging.googleapis.com/operation: Map({"first":"true","id":"get_data","producer":"github.com/MyProject/MyApplication"})
-> logging.googleapis.com/trace_sampled: Bool(false)
-> log.file.name: Str(gcp.json)
-> severity: Str(ERROR)
-> logging.googleapis.com/trace: Str(projects/my-projectid/traces/06796866738c859f2f19b7cfb3214824)
-> gcp.logging.googleapis.com/operation.id: Str(get_data)
-> gcp.logging.googleapis.com/operation.producer: Str(github.com/MyProject/MyApplication)
-> gcp.logging.googleapis.com/operation.first: Str(true)
-> gcp.message: Str(There was an error in the application.)
-> gcp.logging.googleapis.com/insertId: Str(42)
-> gcp.logging.googleapis.com/labels.user_label_1: Str(value_1)
-> gcp.logging.googleapis.com/labels.user_label_2: Str(value_2)
-> gcp.logging.googleapis.com/trace_sampled: Bool(false)
-> gcp.logging.googleapis.com/sourceLocation.file: Str(get_data.py)
-> gcp.logging.googleapis.com/sourceLocation.line: Str(142)
-> gcp.logging.googleapis.com/sourceLocation.function: Str(getData)
Trace ID: 06796866738c859f2f19b7cfb3214824
Span ID: 000000000000004a
Flags: 0
{"kind": "exporter", "data_type": "logs", "name": "debug"} I post this only for a more fair syntax comparison because I do not believe the one in the proposal is optimized. |
I'll also admit that my knee-jerk reaction is to be defensive haha I really appreciate the time you've spent on this proposal and your desire to improve OTTl. I promise to keep an open mind! |
First, want to thank you for providing more optimal code for the use case! What I posted was our naive attempt with our current understanding of OTTL. It seems good that things can be simplified and I think some of our points were perhaps bugs that are now fixed as opposed to assumptions on syntax. That said, I still think expanding available "expression" syntax and providing features like pattern matching or structural literals would be ideal to the types of programs you expect in OTTL. I'm going to answer your questions over the next few days as I can and in separate comments, to hopefully have a good conversation and keep different points and features in separate conversation. My proposal is here to help. While I stand behind all of it, we can try different pieces initially. I'm mostly looking to make sure when OTTL is stable it is scalable. This isn't meant to be an attack on the current state, more a vision of where we can grow, and grow (hopefully) without breaking existing users. |
This is my idea of a merge operation. The second is meant to be a mechanism of deleting multiple values from the JSON body. The first examples does the following:
The ability to chain the merges keeps the syntax brief but also visually you can see the key-value pairs being set. |
Pattern matching is designed to do two things:
The goal in this addition is not just sharing the conditional but also gaining a term (name? Reference?) You can use that has some desired type or feature. Example from scala: case class Data(x: Int, y: String)
val x: Any = ...
x match {
case Data(_, name) => ...here I can use "name" as a string...
} Example from rust: enum Option<T> {
None,
Some(T)
}
match opt_value {
None => handle_none(),
Some(value) => handle_value(value),
} The idea here is that you can chain these patterns too to dramatically simplify both checking a condition and adding a new term. For OTTL I imagine we'd use this to verify log body AnyVal have specific fields of specific types and give names to those fields we can use in the program.
Here the user selected Hope that helps! Let me know if you need more details or examples from other languages. |
Absolutely, I highlighting my own shortcomings haha we share the goal of making OTTL scalable and stable. Thanks again for thinking about this! |
@jsuereth in the examples where is the variable |
@jsuereth I'd be really interested in seeing the examples from https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor#examples rewritten with this new syntax. I am curious how non-log/non-json statements look like and how they'll look in a yaml config. |
This the pattern matching component:
The name |
Oh I see now. I was interpreting |
I'm going to remove |
It is a chain of function calls, Scala is the language I'm borrowing from heavily here. Rust also has pattern matching with extraction, but doesn't expand to allow custom functions. |
Tried to provide what examples might look like with proposed language features: https://github.com/jsuereth/ottl-proposal/blob/main/examples.md Note: If you think it'd be worthwhile for me to tailor language feature proposals into components and what things might look like incrementally adding one feature at a time, let me know. E.g. there are examples I felt pattern matching helps the most, others list-comprehension. While I do think all the language features would benefit OTTL, I'm happy to entertain a subset of this to contribute. |
I am confused by the |
In the context of pattern matching, json is effectively an output. We have a set of functions/methods called "extractors" which take an input and return So, for example, if i have a symbol
here Note: The simplest form of pattern matching just lets you extract structure arguments by name, but without any of these fancy function things. I think given JSON + Logging and the need to deal with AnyVal all the time, this feature can be useful. Particularly if adding new "patterns" is as easy as a function with optional return. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I haven't had much time the past few months to pursue this (due to heavy investment in codegen / semantic-conventions / weaver). I would like to repursue this with an aim and reducing the ticket/issue count on OTTL with targeted features (or just general OTTL language specification improvement). @TylerHelmuth If you're amenable, maybe we can set up some time to chat? At a minimum I think given #32080 - The most important thing from this proposal would be to specify what |
@jsuereth If you are able to attend, this would be great to discuss at a Collector SIG meeting. We have them every Wednesday at 12:00 EDT / 09:00 PDT. Here's a link to the meeting notes with a link to the Zoom room. If this time doesn't work for you, I agree a call would be a good way to discuss our options here. |
@evan-bradley That time, unfortunately, conflicts with semconv tooling WG, so I'm unable to attend. |
We discussed briefly at OTEL community days. I'll follow up with y'all after my summer vacation/travel have passed. I think we can be highly focused on a few points of the proposal to make progress against key friction points. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been closed as inactive because it has been stale for 120 days with no activity. |
Component(s)
pkg/ottl
Is your feature request related to a problem? Please describe.
We were attempting to use OTTL to transform Google's structured logging format in GKE (see: https://cloud.google.com/logging/docs/structured-logging) to the current example data model in OpenTelemetry (see: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model-appendix.md#google-cloud-logging).
Effectively, we're trying to parse a JSON log body and extract components into OTLP equivalents:
In doing so we noticed a lot of friction in OTTL and duplicate expressions.
Here's a "simplified" (i.e. still missing some if/where statements, and new required built-in functions for span processing) version:
Describe the solution you'd like
We'd like to propose a new expression-focused syntax for OTTL that would allow the previous OTTL to look like this:
At a high level we propose the following:
get error messages without running the collector. (e.g. Go, Rust, Typescript)
lists of "KeyValueList" i.e. Attributes.
limit
,truncate_all
,replace_*
,keep_keys
,delete_keys
.AnyValue
attributes and log bodies, reducing the need to duplicate intent betweenwhere
clause and statements.Describe alternatives you've considered
We investigated leveraging CEL or LUA for this purpose.
Unfortunately there are a few shortcomings we think this proposal would alleviate:
Additional context
I have a fully implemented prototype "trans-piler" which can take this new syntax and backport OTTL statements from it. This prototype includes grammar suggestions and rationale.
I would like to consider whether OTTL should expand its expression prior to #28892.
The text was updated successfully, but these errors were encountered: