Support for AMQP (RabbitMQ preferred) as transport #190

lxndrp · 2014-04-05T13:17:26Z

Folks,

I would appreciate very much if you'd consider adding AMQP as an output transport to lumberjack.

At least for the RabbitMQ and ActiveMQ implementations of AMQP, authentication, encryption, and compression are supported, so that the main requisites for a new transport are met.

The background is that we are running very large numbers of potential log shippers, and I'd like to distribute the load via our RabbitMQ HA clusters rather than via direct lumberjack -> logstash connections, as we cannot afford loss of messages on server crash or overload. RabbitMQ brings features for federation, queue rerouting, throttling and rate limiting, replication and many other things which logstash does not have.

Let me know what you think.

Cheers,
Alexander

The text was updated successfully, but these errors were encountered:

graph1zzlle · 2014-04-07T08:51:21Z

+1 !

Same use case here !

petebowden · 2014-04-09T02:28:52Z

Why not lumberjack -> logstash (collector) -> AMQP -> logstash (processing)

Understood that it's another application to run.
Also lumberjack should stop shipping if logstash goes down?

Pete Bowden

petebow4@gmail.com

On Mon, Apr 7, 2014 at 4:51 AM, graph1zzlle notifications@github.comwrote:

+1 !

Same use case here !

Reply to this email directly or view it on GitHubhttps://github.com//issues/190#issuecomment-39707937
.

graph1zzlle · 2014-04-09T08:11:42Z

That would introduce a useless logstash collector layer, and with the load might need multiple collector instance, load balanced as well, so...

driskell · 2014-04-09T09:14:13Z

Hi guys,
I believe the elasticsearch team were already working on a ZeroMQ implementation. But time was moved to finish LogStash 1.4. I think once things calm down there and resource get freed they'll probably finish it off.
Jason

jordansissel · 2014-04-09T16:33:52Z

It is unlikely logstash-forwarder will support AMQP. If you need AMQP support, logstash can do this, and as suggested above, I also recommend logstash-forwarder -> logstash -> amqp

This project aims to solve a small specific problem and adding AMQP support is against that goal, in a way. Logstash itself supports many protocols, AMQP included, so I recommend you use that instead if you need to use AMQP :)

joerocklin · 2014-06-09T12:09:02Z

@jordansissel can you explain the 'small specific problem' with which AMQP support conflicts? Based on the documentation in the Readme, the output channel is only discussed with regard to the requirements for adding a new protocol and AMQP meets these goals.

jerrac · 2014-08-27T17:00:49Z

Those quotes are from the readme:

Actual Problems: Logstash, for right now, runs with a footprint that is not friendly to underprovisioned systems such as EC2 micro instances; on other systems it is fine. This project will exist until that is resolved.

My setup would benefit greatly from being able to run something with a lower RAM footprint than logstash itself. That's what I want to use logstash-forwarder for.

The lumberjack protocol used by this project exists to provide a network protocol for transmission that is secure, low latency, low resource usage, and reliable.

RabbitMQ supports ssl connections. http://www.rabbitmq.com/ssl.html

Creating another server, or instance of logstash, just to stage logs from logstash-forwarder to rabbitmq feels kind of messy. It introduces another place where things could break down.

logstashforwarder -> logstash stager -> rabbitmq -> logstash indexer -> elasticsearch is a pretty long chain...

Anyway, that's why I'd like to see logstash-forwarder be able to send to rabbitmq. :)

jordansissel · 2014-08-27T17:15:33Z

@joerocklin I lost interest in AMQP around 2-3 years ago when the AMQP ecosystem fractured into brokers that supported 0.8, 0.9, 0.9.1, and 1.0. I don't know the current state of things, but I can confirm that Logstash renamed the "amqp" plugin to "rabbitmq" simply because through that fracture, and perhaps by accident, the only known-supported broker for the logstash amqp plugin was RabbitMQ - nothing else really worked due to protocol deviations. So from my experience, "AMQP" is this nebulous cloud of things that probably actually speak different protocols despite claims of using whatever they are calling "AMQP."

If we focus specifically on RabbitMQ, what is the benefit in doing putting a broker in between lsf and logstash, where today the protocol used does not require a broker? Operationally, it has been wonderful for users that a broker has not been required between lsf and logstash.

Creating another server, or instance of logstash, just to stage logs from logstash-forwarder to rabbitmq feels kind of messy.

RabbitMQ is a message passing system, so the end goal of moving your logs is never going to be "store them in rabbitmq" because rabbit is a transit system, not a storage system. Where do they go after that?

I want to keep logstash-forwarder easy to maintain and support, and it's not clear to me how adding RabbitMQ support would make it easier to maintain (more code) and easier to support (more complexity in setup, RabbitMQ is not well understood by many who use it based on my experiences).

jerrac · 2014-08-27T17:23:01Z

My need for a queue is to prevent loss of data when the indexer goes down.

I just saw this: http://michael.bouvy.net/blog/en/2013/12/06/use-lumberjack-logstash-forwarder-to-forward-logs-logstash/#comment-1423397187 which makes me think (as I commented there) that logstash-forwarder removes my need for a queue entirely. Am I right?

jordansissel · 2014-08-27T17:26:08Z

My need for a queue is to prevent loss of data when the indexer goes down.

The design of the logstash-forwarder is to prevent loss of data when remote server goes down. There is no need for a broker agent in between lsf and Logstash.

So, you are right!

The "queue" that logstash-forwarder uses is actually the files it is reading from, and it uses a network protocol that ensures reliable delivery of that queue's contents (the lines of your logs) to downstream servers.

jerrac · 2014-08-27T17:30:24Z

That means I can reduce my chain to logstashforwarder -> logstash indexer -> elasticsearch. Nice. Thanks for the quick answers!

joerocklin · 2014-08-27T17:30:30Z

RabbitMQ is what I'm interested in, so just focusing on it is fine with me.

One scenario to consider is an already deployed RabbitMQ infrastructure which allows traffic from various network segments to communicate with it. If I can plug my messaging transit into a system which is already established and trusted for security, then my cost for getting messages from point A to point B are drastically reduced as the infrastructure is already in place. No new systems to deploy, no new ports to open or traffic patterns to identify. Depending on how RabbitMQ is deployed, there could be extra durability in the message transit in the event of failures.

So in my case, I'm looking for:
logstash-forwarder -> existing RabbitMQ infrastructure -> logstash processing nodes

jordansissel · 2014-08-27T17:37:29Z

@joerocklin I am totally happy to have you use RabbitMQ, btw. In this case, you can achieve success by using something other than logstash-forwarder. Logstash itself supports rabbitmq output. Further, there are probably a dozen other projects that exist to forward logs and also support different protocols. Can you use one of those? If not, why not?

joerocklin · 2014-08-27T17:55:22Z

@jordansissel For the reasons noted on the logstash page, running logstash proper on the app nodes is rather 'heavyweight', so something with a lighter footprint is desirable. Whether this is a 'real' problem or a perceived one is debatable, but either way: it's a problem. Using something from the same authors is nice, as the perception is that changes will occur in lockstep and we're less likely to get hit with strange update issues.

Perhaps there need to be some updates to the documentation to answer some extra questions:

re: logstash-forwarder existing until the problems (under-provisioned systems going away or logstash getting lighter-weight) no longer exist - Since it's unlikely that under-provisioned systems will go away, are there plans for changing logstash in such a way as to remove the need for logstash-forwarder
re: Future Protocol Discussion - RabbitMQ can handle the transport security requirements, and the necessary pieces can be included in a packaged format (since there are go libs for rabbitMQ https://github.com/streadway/amqp). It doesn't care what the message content is, so it can be compressed and be whatever the other side need. I have no idea what this would do to the resulting binary size, and that could be a concern.

Why not use something else: I'm looking at some of the other options (my original comment was back from June). I would still really like to use something from the same authors of elasticsearch for the reasons mentioned above. If you know of other reliable projects that provide answers to requests that you do not plan to implement, it would be really helpful to provide some links to them.

lxndrp · 2014-08-27T18:56:06Z

@joerocklin Regarding alternatives, I can recommend https://github.com/josegonzalez/beaver, a very lightweight log forwarder that talks RabbitMQ (and various other protocols). It is written in Python, and we are using it on a few hundred machines for about a year now, with no perceivable problems.

@jordansissel Since I opened the initial ticket, I’d like to sharpen the requirements I stated in my initial post:

I am asking for specifically RabbitMQ support, not AMQP. Being totally aware of the utter mess with AMQP standardization, I completely understand your reluctance working on this. But RabbitMQ is rock-solid, the interfaces are quite stable and, as you stated before, logstash is supporting it already.
What I really want is reliable and resilient end-to-end delivery. Given the amount of logs we have to process, that’s why I’d love to see RabbitMQ support, because it helps me building a quite fault-tolerant, scalable setup. Using LSF and logstash directly doesn’t really help me, because then I would have a) to scale logstash collector instances (which can be done, but is more costly than doing this with RabbitMQ) and b) build layers of HA around logstash which I am getting for free with RabbitMQ (Clustering, Federation, Shovels, Routing, etc.).
Although there are other shippers that have a small footprint, I still feel that LSF has an even smaller one. It is a very well written, designed-for-performance, do-one-thing-right Go application; and I like that.
(you might consider this as stupid, but anyway) I feel somewhat better getting a toolchain from one supplier rather than many. Nothing against Jose Gonzales, who did a great job in writing beaver; but still, if LSF is something supported by Elasticsearch (as a company contributing to Open Source), I’d prefer to get things from there – for the same reason I am using RabbitMQ rather than some esoteric MQ system found somewhere on GitHub.

I hope this makes my intentions a bit clearer.

jordansissel · 2014-08-28T03:29:09Z

I did some thinking about this. Three points came to mind.

First, I am personally resisting rabbitmq due to previous and many bad experiences with AMQP, its fracture, and its complexity and that complexity's impact on users. Summarize my view here simply as "opinion" and we can throw it in the trash because it's of little value to the technical discussion at hand - my opinion of amqp has nothing to do with your opinion, experience, or need of amqp. I want to be clear that I want to take my opinion here out of the discussion, so I"ll try to avoid it in the future :)

Second, I don't remember much about AMQP or RabbitMQ, so I have little confidence in my own personal ability to support users on such a feature. This lack of confidence manifests itself in my resistance to the feature. My confidence, like my opinion, should not be forced to impact your business needs, requirements, opinions, or experiences. This is a community, not a "Jordan being by himself" and as such we can remove my fear and lack of confidence from the arguments against rabbitmq support.

Third, we are actively working on a new protocol design, and the current model we are discussing internally doesn't seem like it would work well over RabbitMQ simply because there's going to be new needs for bidirectional communication between lsf and downstream servers. This new model is not set in stone, but it might be annoying to try and shoe-horn over RabbitMQ's protocol. I'll know more once we further discuss this internally.

Given points 1 and 2 being maligned opinion and lack of confidence, we can throw that out. I am willing to consider RabbitMQ as a transport (even if I can't support it personally due to lack of knowledge), but only once we figure out what the new protocol concept is going to look like.

Does this make sense?

jordansissel · 2014-08-28T03:31:20Z

@joerocklin and @lxndrp and everyone else on this ticket: I very much appreciate your efforts and time spent in this discussion.

alphazero · 2014-08-28T14:25:46Z

@lxndrp Hi Alex, as @jordansissel mentioned, we are in the review stage of the new protocol and other enhancements that would address the reliability and resilience issues, and of course would love to get the community feedback on this. I'll post an update on this here when we've gone through our internal review cycle on this.

lxndrp · 2014-09-01T09:02:31Z

Thanks @jordansissel, @alphazero for the update. I'd be very happy to provide feedback or otherwise help with LSF. Let me know if you need anything.

mohben · 2014-09-02T20:25:44Z

@jordansissel you say that LSF can observe low traffic and network crash and manage log shipping, what if a network problem happens just after the log been shipped, the latter will be lost in nature. Do LSF ensure really a guaranteed delivery (using e.g a dead letter channel)?

I'll appreciate you hint, folks.

driskell · 2014-09-02T20:29:09Z

@mohben the guarantee is that it will be delivered at least once to the receiver (logstash)

In your scenario it will ship the log again since it could not guarantee the remote side received it successfully as no acknowledgement was received.

mohben · 2014-09-02T20:47:06Z

Hmmm, LSF is so expecting an ack from it's output ?

driskell · 2014-09-02T20:50:21Z

Yes the lumberjack protocol has acks. Logstash will ack once it's queued the log.

However if logstash crash it can lose whatever is in queue which is 10 items. But that's it. There's no end to end guarantee from forwarder to elasticsearch. But having guarantee on the network forwarder to logstash at least reduces impact significantly

cemuzunlar · 2014-09-29T23:54:44Z

I was about the add a feature request for AWS SQS output and then saw that related issues (output to Kafka) is directed here so i want to add my notes here.

First of all, thanks for the wonderful software, it is lightweight and works really well.

And my notes on the subject:

Suppose we have many servers generating lots of logs also server count and log volume tends to grow. But we don't want/need to process&store&query the logs in realtime and at the same pace with the log generation. Because we want to lower the costs.

For example: Most mobile games have an activity graph which tends to increase in certain times of the day and then gradually decrease in certain times of the day. We can tolerate logs to pile up in busy times because we know that there will be non-busy times and logs will slowly be drained.

What we need is:

A light and fast shipper sending raw logs from the local machine to a remote intermediary storage (AWS SQS etc.) which is easy to scale and durable. So we'll be sure our log message moved out of the generating machine and stored in a reliable storage for later consumption.
We'll then consume the storage at a pace we need. There may be millions of logs in the storage and we may also be adding millions of new logs every second to the storage. But we can consume with one or more small logstash machines slowly and output to the final destinations. (ElasticSearch etc.)

gdlx · 2014-11-03T17:26:49Z

It seems that people here (including me) are looking for some kind of "rabbitmq-forwarder". logstash-forwarder is provided by logstash community as a lightweight tool to send data to logstash. Why should it get fatter to communicate with something else ?

A tool sending data to rabbitmq should be provided by rabbitmq community! And well...it actually exists: it's rabbitmq itself, with the Federation plug-in: https://www.rabbitmq.com/federation.html

The only thing that I miss now is a clean way to send data to the rabbitmq "local agent" from stdin... Many small scripts can do that, but nothing in the "clean ecosystem", i.e. maintained, updated, packaged, ...

driskell · 2014-11-03T17:43:50Z

I played with ZeroMQ some time ago and I now have a stable implementation in Log Courier if you are interested, which is working quite well for me. Log courier is based on logstash-forwarder - it has all my major changes and improvements.

The idea is to completely phase out a requirement of any other item in the stack except the shipper and Logstash. So we can set up shippers to load balance events across multiple Logstash instances and automatically fail over and retransmit as required.

Thought it worth mentioning if people are looking for something similar. I'll be improving it more over time too so feedback is welcome. Building it with ZeroMQ 3.2 is fairly straight forward on most distributions as there are zeromq3 packages available (CentOS/Ubuntu etc). It can also run with Curve encryption if you manage to get ZeroMQ 4.0 packages. I'm planning to provide CentOS packages with zeromq3 support soon.

gdlx · 2014-11-05T09:20:25Z

@driskell I've tried Log Courier which works fine but unfortunately doesn't fit my need as it doesn't support JSON as input codec. As it seems you've planned to support it, I'll benchmark it as soon as it's available.

driskell · 2014-11-05T09:29:24Z

@gauthier-delacroix You can still use the JSON filter provided by Logstash which is very quick (it uses an extremely fast Jackson JSON library). This is how I manage it at the moment. But yes I do plan to allow additional codecs - since adding them does not increase resource usage at all unless you enable them, so it keeps light weight - but adds flexibility when needed.

gdlx · 2014-11-05T18:19:18Z

@driskell As I'm using varnishncsa to generate my logs, I can directly format them in JSON. Having around 50kRPS peaks is a good reason to avoid using logstash filters as much as possible (I'll need some anyway) and distribute the overhead across my varnish servers (which need as much free RAM as possible but have a lot of free CPU time).

I'll keep an eye on Log Courier anyway and try it again as soon as JSON codec is available because I'm interested by many of your features.

abhishekdelta · 2014-11-12T10:27:50Z

Just want to add a big +1 for the lsf-> RabbitMQ feature request. I have a very similar use case as @joerocklin and @lxndrp where I want to leverage an existing RabbitMQ infrastructure to transport logs. Having said that, I really appreciate the effort put in by the authors in actively discussing this new feature in-spite of lacking technical confidence.

karlatkinson · 2015-03-25T14:06:06Z

LSF -> SQS would be awesome 👍
I've got a ton of EC2 instances I want to ship logs off without having to use full-on logstash.

jerrac · 2015-03-25T17:02:17Z

From the sounds of things, lsf's functionality is going to be rolled into
logstash itself. See
http://www.elastic.co/guide/en/logstash/roadmap/current/index.html
Hopefully lowering RAM usage is part of that.

--David Reagan

On Wed, Mar 25, 2015 at 7:06 AM, Karl Atkinson notifications@github.com
wrote:

LSF -> SQS would be awesome [image: 👍]
I've got a ton of EC2 instances I want to ship logs off without having to
use full-on logstash.

—
Reply to this email directly or view it on GitHub
#190 (comment)
.

jonatanblue · 2015-11-05T13:57:57Z

Thank you for discussing this issue, I really appreciate the work you're doing and the thought going into this. Here is my contribution to why this is an important feature when you start running Logstash and Elasticsearch at scale.

@jordansissel asked:

what is the benefit in doing putting a broker in between lsf and logstash, where today the protocol used does not require a broker?

Elastic recommends using a broker/queue when "data coming into a Logstash pipeline exceeds the Elasticsearch cluster’s ability to ingest the data".

The assumption that forwarders can hold on to data until successfully pushed is invalid if your servers are part of an automatically scaling cluster or group. For example, servers in an AWS AutoScaling Group may be terminated at any time, and any log events not yet pushed will then be lost. You must use a queue, or you will regularly lose data.

The recommended queueing pipeline design puts a shipper between the Logstash Forwarders and the queue:

In this case the forwarder->logstash only limitation adds unnecessary complexity. I understand there are issues involved with supporting other tools, like RabbitMQ, but from a systems design perspective the shipper in this picture is pure waste. If you already have a queue where all events are buffered, why not forward the log events straight to the queue? Why send them via an additional Logstash server? This is not a problem with only a handful of forwarders, but with hundreds or thousands of them you will need an increasing number of powerful shippers.

@gauthier-delacroix suggests shifting the forwarding responsibility to RabbitMQ - I'm all for that, but if queueing is part of the recommended design of a scalable Logstash, wouldn't it make sense for Logstash Forwarder to facilitate it?

There is a related discussion at elastic/logstash#3693, but I'm not sure how/if it addresses this issue.

adiworkoholic · 2016-03-01T09:57:01Z

It's been 4 months since the last activity on this. Any updates on the way forward ?

ruflin · 2016-03-01T10:04:01Z

As logstash-forwarder is no longer under active development and was replaced by filebeat, it is best to continue this discussion under the following issue: elastic/beats#943

jordansissel mentioned this issue Aug 29, 2014

Support sending logs to Kafka, instead of directly to logstash #258

Closed

tbragin added the enhancement label Jul 31, 2015

ruflin closed this as completed Mar 1, 2016

Support for AMQP (RabbitMQ preferred) as transport #190

Support for AMQP (RabbitMQ preferred) as transport #190

Comments

lxndrp commented Apr 5, 2014

graph1zzlle commented Apr 7, 2014

petebowden commented Apr 9, 2014

Pete Bowden

graph1zzlle commented Apr 9, 2014

driskell commented Apr 9, 2014

jordansissel commented Apr 9, 2014

joerocklin commented Jun 9, 2014

jerrac commented Aug 27, 2014

jordansissel commented Aug 27, 2014

jerrac commented Aug 27, 2014

jordansissel commented Aug 27, 2014

jerrac commented Aug 27, 2014

joerocklin commented Aug 27, 2014

jordansissel commented Aug 27, 2014

joerocklin commented Aug 27, 2014

lxndrp commented Aug 27, 2014

jordansissel commented Aug 28, 2014

jordansissel commented Aug 28, 2014

alphazero commented Aug 28, 2014

lxndrp commented Sep 1, 2014

mohben commented Sep 2, 2014

driskell commented Sep 2, 2014

mohben commented Sep 2, 2014

driskell commented Sep 2, 2014

cemuzunlar commented Sep 29, 2014

gdlx commented Nov 3, 2014

driskell commented Nov 3, 2014

gdlx commented Nov 5, 2014

driskell commented Nov 5, 2014

gdlx commented Nov 5, 2014

abhishekdelta commented Nov 12, 2014

karlatkinson commented Mar 25, 2015

jerrac commented Mar 25, 2015

jonatanblue commented Nov 5, 2015

adiworkoholic commented Mar 1, 2016

ruflin commented Mar 1, 2016