-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add log.message #3
Conversation
f3bf226
to
147b59e
Compare
I would prefer I think Just my 2¢ :-) |
Thanks a lot for the feedback, that is super useful. Currently the way I differentiate between logs and metrics is that metrics are pulled on a predefined period and logs are pushed, but this line is very blurry. Based on this the above would also be logs I'm thinking. But I see that a user is in this case not going to look for the "raw event" under The initial reason I wanted to get it out of event prefix was that it felt like the only object in there that does not contain meta information but actual data. I'm also ok with leaving where it is. @webmat Out of curiosity: Would you categorise operational events under logs, metrics or it's own category? |
@webmat Working on elastic/beats#7207 I realised |
For the type of data I was dealing with, I would definitely say "events", in the sense that it was not just numerical data. I was processing email addresses, IP addresses, email headers and email categories. So full on events with lots of juicy data ;-) On the other hand, I don't have a strong attachment to the "event" part of event.raw. It could totally be something else equally generic, like * Well, depending on how broad you define log. But I think most people think of text files, when they hear "log". And "most people" is our audience for this ;-) |
We have some data sources that aren't logs. Some are data base scrapes and some are API calls we make to a service and then index what comes out. Not sure Indexing the original data (before it is parsed and mutilated) as |
Based on the feedback above I reverted the change to For My suggestion is now to only introduce |
Yes that part I agree with. Makes total sense, especially wrt reconstructed messages (like multiline). |
schemas/log.yml
Outdated
|
||
In contrast to the `message` field which can contain an extracted part | ||
of the log message, this field contains the raw log message and should | ||
not be processed. It can have already some modifications like encoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a contractiction: 'raw log message' and 'not be processed' vs. 'can have already some modifications'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Any suggestion for a better description instead of raw we could use here?
CHANGELOG.md
Outdated
@@ -6,6 +6,8 @@ All notable changes to this project will be documented in this file based on the | |||
|
|||
### Breaking changes | |||
|
|||
* Rename `event.raw` to `log.message`. #3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should be removed now that event.raw
stays as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed, thanks for the review.
PR updated with disabling |
README.md
Outdated
@@ -275,6 +275,7 @@ Fields which are specific to log events. | |||
| <a name="log.level"></a>`log.level` | Log level of the log event.<br/>Some examples are `WARN`, `ERR`, `INFO`. | keyword | | `ERR` | | |||
| <a name="log.line"></a>`log.line` | Line number the log event was collected from. | long | | `18` | | |||
| <a name="log.offset"></a>`log.offset` | Offset of the beginning of the log event. | long | | `12` | | |||
| <a name="log.message"></a>`log.message` | This is the log message and contains the full log message before splitting it up in multiple parts.<br/>In contrast to the `message` field which can contain an extracted part of the log message, this field contains the raw log message and should not be processed. It can have already some modifications like encoding applied or new lines removed to clean up the log message.<br/>This field is not index and doc_values are disabled so it can't be queried but the value can be retrieved from `_source`. | keyword | | `Sep 19 08:26:10 localhost My log` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"not index" -> "not indexed"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, w/ regards to my previous comment: "[..] this field contains the raw log message and should not be processed." -> "[..] this field contains the original, full log message."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applyed
The field `log.message` contains the full log message before splitting it up in multiple parts. In contrast to the `message` field which can contain an extracted part of the log message, this field contains the original, full log message. It can have already some modifications applied like encoding or new lines removed to clean up the log message. This field is not indexed and doc_values are disabled so it can't be queried but the value can be retrieved from `_source`.
PR rebased, commit and PR message updated, fixes applied. Read for an other review. |
@@ -459,6 +459,12 @@ | |||
"line": { | |||
"type": "long" | |||
}, | |||
"message": { | |||
"doc_values": false, | |||
"ignore_above": 1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For logs, ignore_above: 1024
will be too small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the docs for this and run some tests: https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html
It seems like ignore_above
does not play any role here as it's not index anyway. _source
is not affected by ignore_above
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. Thanks for testing.
Maybe we shouldn't even set an ignore_above
when index: false
? I imagine others having a similar reaction to mine.
The field
log.message
contains the full log message before splitting it up in multiple parts.In contrast to the
message
field which can contain an extracted part of the log message, this field contains the original, full log message. It can have already some modifications applied like encoding or new lines removed to clean up the log message.This field is not indexed and doc_values are disabled so it can't be
queried but the value can be retrieved from
_source
.