-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression and seg faults moving from 0.9.5 to 1.0 #551
Comments
segfault is very strange given that the module is pure python. So the first thing that jumps out at me there is ujson, which is a wrapper around a C library. I would expect that to be the most likely cause of segfaults in your script. Also note that you do not need to import differently in 0.9.5 v. 1.0.0 -- just import everything from Thanks for the test script, I'll play around and see what I find. |
The import difference was to be sure we were using the same client that was available in 0.9.5. ujson shouldn't have had any effect; the segfault was a few minutes after the last use of it. I was doing testing of this in a vagrant environment which you can use for your testing if you want. I was able to reproduce the segfault in multiple VMs built from the same Vagrant file. |
The performance regression seems to be due to the SO_RCVBUF handling that was added for KafkaConsumer / KafkaProducer. I can patch that to behave as before when using the old interfaces. That patch gives me 50K messages per sec in my local testing. I have not been able to reproduce the segfault. I remain skeptical b/c there is no C code in kafka-python. Any further information you can provide on that would be helpful. Otherwise, I'm not planning to investigate that part any further. |
I'm still able to make it segfault consistently in my Vagrant environemnt even with ujson replaced with json. I can't seem to get a viable core file out of it, most of them are truncated for some reason. With my most recent Vagrant build, using kafka version kafka_2.9.2-0.8.1.1, I was unable to get a segfault initially. I placed the test script I provided in a bash while loop and after a short while it was segfaulting every run. That behaviour remains after letting the Vagrant environment sit idle overnight. The fd leak you fixed yesterday looks like it would be pretty minor if I'm reading it correctly. Do you think it's worth testing the kafka-python client with that fix to see if the segfaults go away? If you want to try using the same environment you can use it with the following steps:
|
The fd fix only affects the new classes. You are testing with the old classes, so I don't think it will matter. You might try installing with PR 557 though. I believe that should at least improve the performance you are seeing. |
I captured a stack trace that shows the program segfaulting trying to construct a new tuple. At the time it segfaulted it had allocated slightly over 9GB of ram so it looks like an out of memory error. If I can figure out how to get the gdb python extensions installed in my environment I'll be able to give more information. The stack trace for what it's worth: init=, fini=, rtld_fini=, stack_end=0x7fffffffe5b8) at libc-start.c:287 #35 0x0000000000578c4e in _start () It looks like it's running into the Out Of Memory killer which explains why you can't reproduce it. My vagrant VM has fairly limited resources. [ 5285.531613] Out of memory: Kill process 16916 (python) score 759 or sacrifice child Running top while running my test script I can see it allocate several hundred MB every five seconds or so. |
@joe-keen 1.0.2 was just released. if you have time, you might test it out and see if you see the same problems. |
I am unable to reproduce -- please reopen if you see similar behavior on the latest release. |
kafka-python version 1.0 was just released and has several issues. There are performance regressions. See, dpkp/kafka-python#551 and the interface has changed. Safest to set the upper bound to < 1.0 until issues are resolved. Change-Id: I5921d1e7659fd41cff7bd5cddad4db0c015e671b
Patch Set 1: Some of the issues I'm concerned about: dpkp/kafka-python#674 dpkp/kafka-python#686 dpkp/kafka-python#579 dpkp/kafka-python#551 Patch-set: 1
I have a test script that uses a KeyedProducer and a SimpleConsumer to test the consumer performance at a various fetch sizes starting at 50K up to 2M in increments of 25K.
In 0.9.5 I see a read speed from the SimpleConsumer of 50K to 60K messages per second depending on the fetch size. In 1.0 using the same test scripts I see speeds in the 7K to 10K range using the same objects and it seg faults before it reaches the 2M fetch size. I've seen seg faults at fetch sizes as low as 225K.
Attached is the test script I use.
kafka_test.zip
The text was updated successfully, but these errors were encountered: