-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Init support for filebeat multiline #570
Conversation
ForceCloseFiles bool `yaml:"force_close_files"` | ||
ExcludeLines []string `yaml:"exclude_lines"` | ||
IncludeLines []string `yaml:"include_lines"` | ||
Multiline *MultilineConfig `yaml:"multi"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call the option multiline to be more clear what is about.
02bdb18
to
40e851f
Compare
Negate bool `yaml:"negate"` | ||
Match string `yaml:"match"` | ||
MaxLines *int `yaml:"max_lines"` | ||
MaxBytes *int `yaml:"max_bytes"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to have max_bytes outside multiline as general configuration, as it not only applies to multiline but could also apply to a very large single line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Will do. same for max_bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you mean by same for max_bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to say max_lines... but max_lines makes no sense. Just will do for max_bytes. Sorry for the noise.
@urso Thanks a lot for getting this started. I had a closer look, but didn't test it yet locally. Some notes:
|
Error handling did not change. But I found error handling or retry functionallty to be too much inter winded. I did try to build "small" composable units being responsible for just one functionality when reading lines. I did find there is no 'higher' level without abstraction leak or doing all multiline-handling much more inter winded with error handling. Processing stages (pipe-line is setup as follows):
The logFileReader implements error handling on lower level, so state machines in LineReader and multilineReader have sane error handling + timeout reader can be safely shutdown if error is not nil. Having the logFileReader removes the requirement of readers being still valid after EOF. |
In this PR I've implemented a layered approach:
Layers 1 and 2 use the io.Reader interface and layers 2 and 3 use the lineReader interface. By swapping the input layer, all processing layers can be reused for different inputs (e.g. TCP/unix socket or stdin). I think layers for dealing with complexity + composing processing elements by interfaces is a quite powerful and easy to use mechanism that fits well with i/o interfaces in go stdlib. Some more refactoring + cleanup in the harvester is required in general. It is the input layer which must behave different for different source types. For log files we want backoff + retry on EOF, but not for sockets or stdin. It's up to the input layer to push bytes until some well-defined EOF is reached. For log files the input layer is implemented by There is no real error handling in the line readers, but I have had to move some error handling to the input layer, so statefull processors in layer 2 are not affected by error handling/recovery in layer 3. I think another advantage of having the logFileReader handling the reading from files is, we're more flexible in adding file reading features. For example:
|
codec encoding.Encoding, | ||
bufferSize int, | ||
maxBytes int, | ||
readerConfig logFileReaderConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be maxBytes part of logFileReaderConfig? Same for bufferSize and codec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logFileReadConfig is the input layer config (file reading only). The other options codec, bufferSize and maxBytes are used to configure the line processing layer.
478dfd0
to
bf5d9f4
Compare
bf5d9f4
to
44e2623
Compare
|
||
====== pattern | ||
|
||
The regexp Pattern that hast to be matched. The example pattern matches all lines starting with [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: s/hast/has/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Pattern
can be lower-cased here. Do you think we should say that it is the same syntax as Perl or link to https://github.com/google/re2/wiki/Syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're using CompilePOSIX to compile the regular expressions. These are 'mostly' compatible to egrep + some quirks (see documentation).
TBH I find it quit difficult to understand the impact on syntax when given this line in documentation:
restricts the regular expression to POSIX ERE (egrep) syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even worse, you have to keep YAML string escaping in mind when doing the regular expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That also relates the the Regexp expression for exclude and include, right? If yes, I suggest to create a separate page with docs to regexp we can link to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we use same compile function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we open an issue for that to not forget about it and discuss it with @dedemorton ?
LGTM |
|
||
====== match | ||
|
||
Match can be set to "after" or "before". It is used to define if lines should be append to a pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/append/appended/
# for Java Stack Traces or C-Line Continuation | ||
#multiline: | ||
|
||
# The regexp Pattern that hast to be matched. The example pattern matches all lines starting with [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/hast/has
44e2623
to
ad19296
Compare
split processing into 3 layers: - input layer - line processing layer - event publisher layer (for loop driver) Input layer is responsible for reading files and forwarding errors if appropriate. new multiline system tests: - elasticsearch log with java exception - c-style log tests - max_lines test - max_bytes test - timeout test added asciidocs for multiline
ad19296
to
0d147f7
Compare
Init support for filebeat multiline
start with multiline for filebeat (#461)