Process multiline logs from multiple docker images #4308

kafkapre · 2015-12-06T13:36:17Z

I run docker container with gelf driver and would like to collapse multiline logs in Logstash. My Logstash conf.

input {
    gelf {} 
}
filter{
    multiline {
      pattern => "^%{TIMESTAMP_ISO8601}"
      negate => true
      what => "previous"
      source => "short_message"
      }
}
output {
    stdout { codec => rubydebug }
}

It works perfectly when I process logs from one docker container, but for two or more it does not work, because it collapse messages of both (or more) logs streams.

I would expect, that setting up multilining in input would solve the problem.

input {
    gelf {      
         codec => multiline {
            pattern => "^%{TIMESTAMP_ISO8601}"
            negate => true
            what => "previous"         
     }
}

but multilining does not work correctly with this set up (seems because of bug). Any suggestions? Thanks.

I am using: Docker 1.9.1, Logstash 2.1

The text was updated successfully, but these errors were encountered:

purbon · 2015-12-09T11:05:47Z

multiline filter is not thread safe, so is not good to collapse logs as you described. one improvement to this would be to use the multiline codec in your input section, see https://www.elastic.co/guide/en/logstash/current/plugins-codecs-multiline.html for details.

Closing this as is a known problem of the multiline filter, feel free to reopen any other issue you might find. Keep in mind https://discuss.elastic.co/, a great source of community support.

kafkapre · 2015-12-09T12:11:32Z

Exactly, I would prefer to put multiline into input section. But, as I described above, it does not work correctly.

purbon · 2015-12-09T12:16:50Z

Use the codec syntax, your example is wrong at my understanding.

/purbon

On Wed, 9 Dec 2015 13:11 kafkapre notifications@github.com wrote:

Exactly, I would prefer to put multiline into input section. But, as I
described above, it does not work correctly.

—
Reply to this email directly or view it on GitHub
#4308 (comment).

kafkapre · 2015-12-09T12:21:58Z

Oh, sorry I wrote i incorrectly. This is correct (I have also corrected it above).

input {
    gelf {      
         codec => multiline {
            pattern => "^%{TIMESTAMP_ISO8601}"
            negate => true
            what => "previous"         
     }
}

And for this case, multilining does not work.

jordansissel · 2015-12-10T00:29:53Z

Hmm, I think this is a problem that the gelf input needs to get together with the IdentityMapCodec. Thoughts, @guyboertje?

guyboertje · 2015-12-10T08:35:12Z

I will look into it

purbon · 2015-12-10T08:50:06Z

At my understanding, after looking at the code base https://github.com/logstash-plugins/logstash-input-gelf/blob/master/lib/logstash/inputs/gelf.rb, the gelf codec have the codec directive, but the codec is never used within. So this is the bug at my understanding.

@kafkapre I agree there is a bug, no codec is actually working there, can you please reopen this issue in the https://github.com/logstash-plugins/logstash-input-gelf repo? thanks a lot for your finding.

uschtwill · 2016-06-06T10:16:11Z

For others that might be searching for a solution to this. This worked beautifully for me: https://stackoverflow.com/questions/34075538/elk-process-multiline-logs-from-multiple-docker-images

guyboertje · 2016-06-13T16:32:06Z

This plugin does not use a codec because it receives a JSON string that represents an event. Codecs are meant to take a line and create an Event. Filters are meant to operate on Events. The multiline codec buffers lines until some condition is seen - it would have to be changed to buffer events and match on a field, but wait - that is what the Mulitline Filter does. So we would need a way to run the multiline filter after an input but before the queue. We can't do that at the moment.

jmreicha · 2016-09-01T02:30:22Z

@kafkapre did you ever find a workaround?

guyboertje · 2016-09-01T07:45:30Z

@jmreicha and other future readers:

The solution offered in the Stackoverflow link from @uschtwill IS the solution for this issue. There are no workarounds. There may well be a performance impact when using the multiline filter because, by design, it needs to receive every event. This is so because we can't ensure that events with the same identity will always travel down the same path in parallel multithreaded filter stages.

jmreicha · 2016-09-01T14:33:05Z

@guyboertje Thanks for the update. I was worried about performance but maybe it's okay? Do you have a ballpark of how many events a single thread can handle?

guyboertje · 2016-09-02T10:10:35Z

@jmreicha - unfortunately, the answer to that question is very difficult to write because it depends on the version of LS and the complexity of the configuration i.e. whether you have other filters before or after the multiline filter, the rate at which the events are 'pumped' in via the input(s) and the time taken for the output to deliver an event out.

This is a fantastic presentation from Avleen at Etsy that give v good advice on systematic tuning of the Elastic Stack.

jmreicha · 2016-09-02T13:51:22Z

I'll check it out, thanks.
On Fri, Sep 2, 2016 at 3:11 AM Guy Boertje notifications@github.com wrote:

@jmreicha https://github.com/jmreicha - unfortunately, the answer to
that question is very difficult to answer because it depends on the version
of LS and the complexity of the configuration i.e. whether you have other
filters before or after the multiline filter, the rate at which the events
are 'pumped' in via the input(s) and the time taken for the output to
deliver an event out.

This is a fantastic presentation
http://www.slideshare.net/avleenvig/elk-mooseively-scaling-your-log-system
from Avleen at Etsy that give v good advice on systematic tuning of the
Elastic Stack.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4308 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA_a-pvJYwliK6IRegefIS9zjOAFFc3gks5ql_YugaJpZM4GvqUw
.

mathiasbn · 2016-12-07T08:40:51Z

I'm a bit confused that this issue is closed.

The proposed solution (the link to stackoverflow) has performance implications.
The multiline filter is deprecated
For some reason the multiline filter cannot be found on logstash:5.0.2-1

antdavidl · 2016-12-12T11:46:13Z

I'm in a similar scenario in which I'm being unable to make the multiline codec works properly with the gelf plug-in in ELK 5.x

The solution from @stream_identity is using the multiline filter with the stream_identity parameters, but the multiline filter is deprecated and unavailable now, so I think we are now at the starting point :-(

maxblaze · 2017-04-08T20:19:50Z

Bump. This really needs to be reopened.

danielmotaleite · 2017-06-12T13:23:12Z

Logstash is the wrong place to fix multiline events, you need to fix as soon as possible, so inside the app or just after the docker:

elastic/beats#918

I hope someone implements the new docker logger beat plugin with multiline support, as in filebeat

maxblaze · 2017-07-17T20:28:46Z

@danielmotaleite I completely disagree. One of the original killer features of Logstash was the ability to handle multiline events.

jordansissel · 2017-07-17T21:18:01Z

@maxblaze I'll reopen this.

For folks curious about the future, there's some problems to discuss:

The multiline filter is gone and probably not coming back. For performance, Logstash will process things out of order (when using multiple pipeline worker threads), which means multiline filtering, even if present, would harm performance.
The aggrregate filter can do what the multiline filter did. It has the same single-threaded, ordered-processing limitations as the multiline filter did.
It could be possible to add codec support to the gelf input, but it is unclear if this will solve y'all's problems.

danielmotaleite · 2018-09-06T14:37:19Z

With filebeat being able to automatically process docker logs (via json-file plugin) and it can manage and merge the multiline in the multiple json events, i would say again that trying to do multiline in logstash is a lost battle... either you can only use one thread to get in-order logs so it can work, or you will get sooner or badly merged log lines or lines that are not merged. The more multi-line events you have and the more load you have, the higher is the risk of broken logs.

the only sane way to solve the multi-line is closer to the source as possible. If not possible in the app producing the logs, the next guy (filebeat, fluent, etc) should be the one trying to do

Openpalm · 2018-12-20T12:09:56Z

please close this. it would be nice if elastic docs incorporated a note about this.

jordansissel · 2018-12-20T16:44:43Z

@Shokodemon filebeat is the recommended solution for what you are doing. https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-docker.html

Openpalm · 2018-12-20T16:52:24Z

@jordansissel fixed. and thank you. luckily, i have a filebeat setup at hand that i've prepared before.

also thank you for the emotional circuit breaker nudge.

jordansissel · 2018-12-20T17:19:27Z

@Shokodemon 👍 I hope filebeat gets you on a path to success! Let us know (probably in elastic/beats or on discuss.elastic.co) if you have issues doing multiline w/ filebeat for docker logs.

jhmartin · 2018-12-20T23:50:56Z

Per moby/moby#17763 , docker considers the json-files as private and won't support changes that make it easier to read them. This makes gelf logging attractive as it doesn't require fighting dockerd or running filebeat as privileged.

From that issue, docker specifically said this in relation to exposing the json-files for logstash consumption:

If you want to use logstash, please use one of the available logging drivers (syslog, fluentd, journald, gelf), which logstash seems to support any/all of natively.

danielmotaleite · 2019-01-08T01:26:03Z

Yes, docker do not officially support parsing of their logs and prefer the docker plugin... elastic build the docker support based on the log file because it works in all docker versions (not only in recent, like the support for docker log plugins) and is easier to work with, based on the existent filebeat code.
As a bonus. running docker log still works fine and huge messages can be merged with the multi-line support.

If using the docker plugin support, you would be limited to the latest docker versions, it would take longer for include support in filebeat and docker log would output anything. While docker plugin code support partial messages generated by docker, most plugins do not support it.

But fear not, if you do not trust the filebeat docker log parser, just use a docker log plugin, like the kafka-logdrive to output logs to a kafka
https://github.com/MickayG/moby-kafka-logdriver

or redis
https://github.com/pressrelations/docker-redis-log-driver

or sematext (but i do not know enough about this one, but i think it can output to elasticsearch):
https://github.com/sematext/sematext-agent-docker
https://hub.docker.com/_/sematext-agent-monitoring-and-logging

purbon closed this as completed Dec 9, 2015

Teudimundo mentioned this issue Apr 6, 2016

Gelf input should use codecs logstash-plugins/logstash-input-gelf#37

Open

jordansissel reopened this Jul 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process multiline logs from multiple docker images #4308

Process multiline logs from multiple docker images #4308

kafkapre commented Dec 6, 2015

purbon commented Dec 9, 2015

kafkapre commented Dec 9, 2015

purbon commented Dec 9, 2015

kafkapre commented Dec 9, 2015

jordansissel commented Dec 10, 2015

guyboertje commented Dec 10, 2015

purbon commented Dec 10, 2015

uschtwill commented Jun 6, 2016

guyboertje commented Jun 13, 2016

jmreicha commented Sep 1, 2016

guyboertje commented Sep 1, 2016

jmreicha commented Sep 1, 2016

guyboertje commented Sep 2, 2016 •

edited

Loading

jmreicha commented Sep 2, 2016

mathiasbn commented Dec 7, 2016

antdavidl commented Dec 12, 2016

maxblaze commented Apr 8, 2017

danielmotaleite commented Jun 12, 2017

maxblaze commented Jul 17, 2017

jordansissel commented Jul 17, 2017

danielmotaleite commented Sep 6, 2018

Openpalm commented Dec 20, 2018 •

edited

Loading

jordansissel commented Dec 20, 2018 •

edited

Loading

Openpalm commented Dec 20, 2018

jordansissel commented Dec 20, 2018

jhmartin commented Dec 20, 2018 •

edited

Loading

danielmotaleite commented Jan 8, 2019

Process multiline logs from multiple docker images #4308

Process multiline logs from multiple docker images #4308

Comments

kafkapre commented Dec 6, 2015

purbon commented Dec 9, 2015

kafkapre commented Dec 9, 2015

purbon commented Dec 9, 2015

kafkapre commented Dec 9, 2015

jordansissel commented Dec 10, 2015

guyboertje commented Dec 10, 2015

purbon commented Dec 10, 2015

uschtwill commented Jun 6, 2016

guyboertje commented Jun 13, 2016

jmreicha commented Sep 1, 2016

guyboertje commented Sep 1, 2016

jmreicha commented Sep 1, 2016

guyboertje commented Sep 2, 2016 • edited Loading

jmreicha commented Sep 2, 2016

mathiasbn commented Dec 7, 2016

antdavidl commented Dec 12, 2016

maxblaze commented Apr 8, 2017

danielmotaleite commented Jun 12, 2017

maxblaze commented Jul 17, 2017

jordansissel commented Jul 17, 2017

danielmotaleite commented Sep 6, 2018

Openpalm commented Dec 20, 2018 • edited Loading

jordansissel commented Dec 20, 2018 • edited Loading

Openpalm commented Dec 20, 2018

jordansissel commented Dec 20, 2018

jhmartin commented Dec 20, 2018 • edited Loading

danielmotaleite commented Jan 8, 2019

guyboertje commented Sep 2, 2016 •

edited

Loading

Openpalm commented Dec 20, 2018 •

edited

Loading

jordansissel commented Dec 20, 2018 •

edited

Loading

jhmartin commented Dec 20, 2018 •

edited

Loading