-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor Logstash Throughput #27
Comments
Thanks for the reporting! |
@ingshtrom I have a beta version for enhancing the throughput now in branch https://github.com/SumoLogic/logstash-output-sumologic/tree/byi-asyc-mem-queue Since there is significant change in the code structure, I may need more time to test and refine it before push to rubygems. But if you want, you can get the beta version now for testing: Install: As a sample config file:
|
looks like your image has a lower version than my build environment.
Can you update the image to 6.2.x? |
gotchya, that is correct. Let me update. |
@bin3377 Do you have a recommended config for high throughput? I've tried several setups that include increasing the size of requests and increase the number of parallel processing, but to no avail. I also tried the config you had above as a starting point. Examples:
|
It's depends on the bottleneck of the process; On the other side, this version is thread safe so theoretically you can use multiple plugins in parallel as workers (https://www.elastic.co/guide/en/logstash/current/tuning-logstash.html) |
That was a really awesome explanation of each property! Thank you. I will give it a bit more time of testing and report back! |
I'm currently running into a weird error while trying to debug things. In the logstash logs I get an error about
When I look in logstash I can see the logs in there, so it's odd that I get this error. I guess my question is what this error is supposed to be meaning. I would presume that I shouldn't be getting it. |
I found the lines in https://github.com/SumoLogic/logstash-output-sumologic/blob/byi-asyc-mem-queue/lib/logstash/outputs/sumologic/sender.rb that generate these errors. I have seen the error here as well, but less often. After looking at this file and testing the URL we pass to the sUmoLogic plugin, I have made 2 conclusions about potential bugs in the code. Please correct me if I am wrong or you have other findings:
I should also note that these server errors do NOT show up when we only supply the Thank you and please advise on recommendations |
The
means the server response is not 200. It could be a server side problem (like 429, 503) or a plugin problem (like 400 if somehow log lines are crapped). And
normally means the connection to sumo is broken or in some rare cases, timeout due to server not response package in time (and possibly adjust with For more details you can refer to this help doc I updated the plugin with some extra logging points, could you please download the plugin again and give it a try? For enabling the logging, you need to update Thanks! |
I used the logging config exactly as it is in the example. Here is the sumologic output config that is being used:
and the logs are available at this gist. It isn't outputting the status code, so I'm not sure, but it might be worth knowing if we are getting rate-limited. That would make the most sense as all the requests seem to go through... until they don't. |
based on the documentation here, it gives an example of throttling for a 10GB account. It shows that you would be allowed around 71MB per minute before being throttled. In the SumoLogic UI it shows that we uploaded around 2.4GB of data so far today. When I look at the logs and when they were ingested, it was primarily during a 5 minute period from 9:20 to 9:25 EST. I'm 99% sure we're being throttled, but you guys should be able to look that up on your end, correct? Because I cannot on my end since the status codes are not being logged. I did the math and even if the 2.4GB was spread out evenly over 30 minutes, we would still hit the theoretical threshold of 71MB / minute. So this throttling, coupled with the fact that logstash likes to throttle everything in an attempt to not lose any data creates a scenario where adding the SumoLogic output plugin essentially grinds all of our log ingestion to a halt. source |
Your guessing should be correct. @frankreno will help you to adjust the account provisioning to unblock the throttling. Currently the plugin will retry the sending on following cases:
|
We got the throttling figured out. I'm still not getting the throughput expected, but I am still testing out different configurations. In the mean time, I noticed a new error I hadn't seen before. Whenever I try to change the config via live-reload the next startup results in lots of errors like this:
These are almost always preceded with errors from the previous process that look like this. I just tested and this stack trace happens with our current config, but wanted to paste it here in case there was an issue with one plugin causing more issues with other plugins. I will update once I do more testing without the throttling. |
This line is not an error. It just means the pile was sent to queue. Normally you will get the line when pile is not filled to |
Ok. They showed more on startup after reload. I wonder if the queue is somehow not filling at all if it is continually logging that without any "HTTP request accepted" messages. Hmmm |
Update after a few days. I've gotten logstash to achieve very similar throughput as prior to sumologic. The problem is that it is consuming a lot of CPU and a lot more memory. Now that I've proven it's possible to get the throughput we need, I'm playing with the numbers to try and get sustained throughput without as much of a change in resource usage. I've had a lot of other projects take my attention away from this over the last few days, but I should hopefully have a question or answer by tomorrow. |
Thanks a lot for your update! I do expect it consuming more memory/cpu for keeping high throughput since memory cache and multiple the compressing/sending threads has the cost. Looking forward to seeing your test result. |
Ok, I think we have achieved the expected throughput! I want to reiterate that I think the increased cost is higher than expected, but it's well within our threshold for the machines that we were already running, so I'm not too worried. Here is a graph with some metrics of throughput, CPU, and memory usage over the last day (look for bright yellow for my annotations): Essentially we are achieving the same in message throughput and network usage while using the same CPU and almost double the memory. These machines have plenty of free memory, though, so it's not a problem. Here are the changes made. The SumoLogic output config:
We increased the JVM Heap from 1GB to 2GB via the We changed the logging config to this. Notice that it is the same as you provided, but using
Prior to introducing SumoLogic, we ran with 4 workers and a I would love to see a more official build. In the meantime I will continue to let this run over the weekend to see how it fares. Thank you for all of your help thus far! |
I also used this query in SumoLogic to view the throughput from your end:
Worked like a charm. Also, in Kibana I could see that we were still getting the same throughput as well, so all of the places that we gauge throughput were showing improved results. |
That's awesome report and all valuable analysis! And very happy the new version can unblock you. I will update the document today and push a new version to ruby-gems so later on it's installable from official channel. Thank you! |
The new version is on RubyGems now https://rubygems.org/gems/logstash-output-sumologic |
Hello,
We're trying to send logs to SumoLogic through Logstash. We current send our logs to an ELK stack, and it maintains a throughput of 25k messages per second, but no matter the configuration we have tried with the SumoLogic Logstash plugin, our throughput drops dramatically every time.
Here are the configurations that we have tried and their results as show by graphs of message throughput.
The deviation here shows the dramatic loss in throughput and dramatic increase in difference between the logs being sent to the indexer and the logs being sent out of the indexer (to SumoLogic/ELK)
The deviation isn't as big, but it definitely is not something that is acceptable. (dropping to about 13k messages per second)
Can you help us with achieving a higher throughput? We are trying to do a trial of SumoLogic, but we have yet been able to achieve the throughput that we expect of our logstash indexers.
The text was updated successfully, but these errors were encountered: