-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epp-proxy hung #30
Comments
If the node actually crashed, there should be an You can use this script to extract readable data from it, based on that info you can where the problem might be: https://github.com/ferd/recon/blob/master/script/erl_crashdump_analyzer.sh If there was no crash dump made, which is possible but unlikely, then the other option is to create a process to log information about system stats over time. Judging by the description, it can either leak atoms or ports over time. |
thank you @maciej-szlosarczyk will do exactly that. |
no luck, could not find the crash dump file. So we continue close monitoring and try to figure out how to log better |
it crashed again - 4 weeks of uptime. Looks by memory usage grapf like memory leaks. All run ok for the most of the time and then suddenly last about 72h memory usage started piling up until 100% was reached and then epp-proxy crashed. |
I think the reason for missing crash dump is because the process is killed by oom, and oom uses SIGKILL not SIGTERM thats why the process has no chance to produce the crash dump. |
I have a general idea where the issue might be: xmerl library can reserve a lot of memory for itself, and it is never garbage collected. I'll look into replacing with one that does not have that flaw when I get a free moment, but in the meantime I added a logger that should show us if the memory keeps increasing over time or just spikes at one moment. |
looks ok - have not had such issue for some time. |
After about two weeks of uninterrupted operation epp-proxy stopped logging and soon after froze and caused epp service outage for about 30 minutes. There were no errors in logs.
Resembles to memory-leak kind of problems. Investigative troubleshooting required
The text was updated successfully, but these errors were encountered: