Gelf input should use codecs #37

Teudimundo · 2016-04-06T09:04:55Z

I'm trying to use a multiline codec with the gelf input. Unfortunately looks like it is ignored. I've checked the code, and it looks like codec is completely ignored (I can barely read Ruby, though). But the same conclusions looks to be reached in the logstash issue #4308

Teudimundo · 2016-04-21T07:17:11Z

To make multiline work, stream identity support is required.

guyboertje · 2016-04-22T10:01:50Z

@Teudimundo - so as I see it, there are a few problems with using multiline codec with this input.

as you correctly identify, some kind of stream identity must be found.
multiline is expecting to receive an endless stream of lines and an Event is generated each time line accumulation stops.
Decompression and dechunking is done in an external library.

From the GELF doc

format after decompression/chunking is JSON
host seems a good candidate for the stream identity.
short_message is mandatory
full_message is optional, the doc suggests this is where a stack trace is to be found.

I have some confusion about the short_message and full_message - does the full_message include the text in short_message if full_message is supplied or is it expected that they are concatenated?

Are you expecting to receive many GELF messages with one or some lines that make up the logical multiline text?

If the logical mutiline text is an error message + stacktrace why are they not in the short_message and full_message fields?
If its not a stacktrace, what is it?

Teudimundo · 2016-04-22T15:02:56Z

I see the problems. In my case I'm using the docker log driver, and so I cannot control the short full message fields, because the driver just creates an event for each line (and obviously is completely unaware of what there is in the line). This would make also 'host' not a good solution either for the streaming, that I would like to be based on a custom filed (in my case container_id for instance).

If the multiline was just collecting events in groups, and delegated the "merging" to the input driver, then it would allow the driver to be able to take some decision, possibly guided by some configuration (like the field that will contain data to be merged).

But that would be a complete different approach and I don't expect it to be implemented.

lacarvalho91 · 2016-06-10T10:49:53Z

Is there any update on this? I'd also like to be able to use multiline codec for gelf input, instead of using multiline filter.

vingrad · 2016-06-19T11:57:57Z

I need this option too.

guyboertje · 2016-06-20T13:35:23Z

It is very unlikely that this will be implemented anytime soon.
The underlying tech in the gelf input creates an event from the JSON. The multiline codec only works on lines that eventually become events.

We are working on a different way of processing raw source data in an input. We call this Event Milling (as in a Saw Mill that transforms raw logs). See this discussion issue.

tobilarscheid · 2016-07-12T18:21:58Z

Hi,

we would really love to use gelf with the docker log driver and get our java stack traces as multiline events. Please keep us posted once there is any progress that does not involve using the multiline filter.

Thanks!

guzmo · 2016-08-10T07:49:15Z

Hey,

I've been trying to find out how to use any input with the multiline codec from multiple sources but all the issues are still open ( Some got created last summer ?? ). Is anyone working on these issues? Can't use this if its not supported, we cant create 100s of different logstashes to make a 1-1 relation between logstash and an application. Tried to put multiline in the filter but its terrible slow. Started to lose logs instantly when I pushed it a bit.

Thanks!

flypenguin · 2017-01-12T09:42:12Z

Yes, any updates here? We're also hitting this with docker log output, and I would appreciate any info about the current state (even if it might be disappointing ;) )

mathiasbn · 2017-01-12T09:58:12Z

We gave up finding any existing solution to the stacktrace problem and implemented a 800 LoC "Lokstash" (it's in kotlin), that uses the dockerApi project and elasticsearch java lib. It means we need one pr docker server unfortunately

flypenguin · 2017-01-12T10:05:47Z

Hm. Any docs on this? Did you open-source it? Is it usable for others (read: me)? :)

mathiasbn · 2017-01-16T12:33:33Z

@flypenguin Not really. Guess it could be open sourced, but it's pretty special cased to our use case. It was just to say, that we used less time implementing our selves than we have spent searching and testing existing solution

jhmartin · 2017-05-11T19:54:38Z

For reference, a GELF-format message from Docker is below:

{
           "version" => "1.1",
              "host" => "worker1",
             "level" => 6,
          "@version" => "1",
        "@timestamp" => "2016-08-25T18:06:12.365Z",
       "source_host" => "172.17.0.1",
           "message" => "10.255.0.4 - - [25/Aug/2016:18:06:12 +0000] \"GET /index.html HTTP/1.1\" 200 612 \"-\" \"ELB-HealthChecker/1.0\" \"-\"",
           "command" => "nginx -g daemon off;",
      "container_id" => "768ba63e73a81c6e1b7b172f6011207be39550160a8aa868dcd5ba522117a55c",
    "container_name" => "determined_hopper.3.a92k39txk1d7yo269o1wv1m4f",
           "created" => "2016-08-25T17:39:16.911988648Z",
          "image_id" => "sha256:4efb2fcdb1ab05fb03c9435234343c1cc65289eeb016be86193e88d3a5d84f6b",
        "image_name" => "nginx:latest",
               "tag" => "",
              "type" => "gelf-test"
}

container_id is perfect for identifying lines from the same source. With this data is there a reason the 'message' field couldn't be passed off to a multiline codec for collapsing?

flypenguin · 2017-05-11T21:20:51Z

yup, a solution for this would be just awesome. the use case is pretty clear to me, first hand.

mancej · 2017-05-12T19:04:25Z

Agreed, we are also stuck due to this exact issue. Merging multi-line gelf messages from the docker log driver in logstash seems impossible.

vbohata · 2017-06-23T13:06:47Z

+1 big problem for me too. I need the multiline support for GELF, preferrably with stream identity support

guyboertje · 2017-06-27T14:15:00Z

@jhmartin, @flypenguin, @mancej, @vbohata

I will try to explain why this is not so easy in an attempt to stop all the +1's. It is on my radar but we are few in the LS team and there is much preparatory work we need to do first before a sustainable solution for this problem can be realised.

The API for a codec is:
def decode(data) - meaning a line of data, the method then does something to the data and creates a Logstash Event object and passes the event to a closure which closes over the reference to the queue - the codec does not have a direct reference to the queue object.

The API for the identity_map_codec (AKA IMC, a codec wrapper with identity support - the multiline codec is usually the wrapped codec) is:
def decode(data, identity = nil, &block) - its up to the caller to supply the identity. In this case the caller will be this gelf input (meaning the input code will have to change)

Each new identity seen will create a fresh copy of the multiline codec because we have to keep the accumulations separated.

The main piece of complexity is that the multiline codecs via the IMC need to flush their accumulated lines when a timeout occurs and no new multiline boundary is seen. As this timeout may occur asynchronously (not when the decode closure is available to call - to enqueue the flushed event) we have to do a bit of hoop-jumping to make the closure available to all. These timeout flushes operate in a different thread than the input thread so we have to take thread safety into account.

If the LS input design was different (called Event Milling, linked above) we could simply add a new processor between the source (gelf UDP listener) and the IMC to extract the identity and the message from the event built by the source and pass those to the IMC and the IMC will pass to the queue.

eguven · 2017-10-18T14:30:09Z

I wanted to note that since the multiline filter is deprecated in logstash-5.x this is a blocker for anyone who wants to upgrade and keep multiline intact.

tonsV2 · 2018-07-02T08:11:04Z

Is there any news on this issue? Being able to handle multiline GELF input is crucial

caub · 2018-07-17T12:57:41Z

I solve that problem by logging errors in json (one line) from my apps

phynias · 2018-08-23T19:41:27Z

any updates on this ?

caub · 2018-08-23T19:58:37Z

filebeat can do that job well too

albgus · 2018-10-02T07:03:45Z

No real solution since the issue was posted 2016. I think the conclusion is to look into another solution to aggregate docker logs.

thenewguy · 2020-05-20T13:36:42Z

Looking for a workaround. If I have control of the log formatter used by the application, is there any way I can encode newlines and then replace the encoded character with newline using some sort of logstash filter?

thenewguy · 2020-06-09T13:24:19Z

Well I ended up urlencoding the logs in the application. But it is ugly - would appreciate a real solution that didn't obfuscate the original logs

DelDennis · 2021-03-03T06:15:08Z

Still nothing on this?

ph mentioned this issue Apr 6, 2016

Stream identity support to properly handle multiple sources logstash-plugins/logstash-codec-multiline#10

Closed

3 tasks

michaelkrog mentioned this issue Mar 31, 2017

log driver should support multiline moby/moby#22920

Closed

dbsanfte mentioned this issue Jul 3, 2017

Render leading whitespace in log messages sivasamyk/logtrail#140

Closed

matrixik mentioned this issue Aug 2, 2017

Getting logs from all containers into Logstash monasca/monasca-docker#139

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gelf input should use codecs #37

Gelf input should use codecs #37

Teudimundo commented Apr 6, 2016

Teudimundo commented Apr 21, 2016

guyboertje commented Apr 22, 2016

Teudimundo commented Apr 22, 2016

lacarvalho91 commented Jun 10, 2016

vingrad commented Jun 19, 2016

guyboertje commented Jun 20, 2016

tobilarscheid commented Jul 12, 2016

guzmo commented Aug 10, 2016

flypenguin commented Jan 12, 2017

mathiasbn commented Jan 12, 2017

flypenguin commented Jan 12, 2017

mathiasbn commented Jan 16, 2017

jhmartin commented May 11, 2017

flypenguin commented May 11, 2017

mancej commented May 12, 2017

vbohata commented Jun 23, 2017

guyboertje commented Jun 27, 2017

eguven commented Oct 18, 2017

tonsV2 commented Jul 2, 2018

caub commented Jul 17, 2018

phynias commented Aug 23, 2018

caub commented Aug 23, 2018

albgus commented Oct 2, 2018

thenewguy commented May 20, 2020

thenewguy commented Jun 9, 2020

DelDennis commented Mar 3, 2021

Gelf input should use codecs #37

Gelf input should use codecs #37

Comments

Teudimundo commented Apr 6, 2016

Teudimundo commented Apr 21, 2016

guyboertje commented Apr 22, 2016

Teudimundo commented Apr 22, 2016

lacarvalho91 commented Jun 10, 2016

vingrad commented Jun 19, 2016

guyboertje commented Jun 20, 2016

tobilarscheid commented Jul 12, 2016

guzmo commented Aug 10, 2016

flypenguin commented Jan 12, 2017

mathiasbn commented Jan 12, 2017

flypenguin commented Jan 12, 2017

mathiasbn commented Jan 16, 2017

jhmartin commented May 11, 2017

flypenguin commented May 11, 2017

mancej commented May 12, 2017

vbohata commented Jun 23, 2017

guyboertje commented Jun 27, 2017

eguven commented Oct 18, 2017

tonsV2 commented Jul 2, 2018

caub commented Jul 17, 2018

phynias commented Aug 23, 2018

caub commented Aug 23, 2018

albgus commented Oct 2, 2018

thenewguy commented May 20, 2020

thenewguy commented Jun 9, 2020

DelDennis commented Mar 3, 2021