Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epp-proxy hung #30

Closed
vohmar opened this issue Jan 29, 2020 · 7 comments
Closed

Epp-proxy hung #30

vohmar opened this issue Jan 29, 2020 · 7 comments
Labels
EPP-Proxy Issue in the EPP Proxy project

Comments

@vohmar
Copy link
Contributor

vohmar commented Jan 29, 2020

After about two weeks of uninterrupted operation epp-proxy stopped logging and soon after froze and caused epp service outage for about 30 minutes. There were no errors in logs.

Resembles to memory-leak kind of problems. Investigative troubleshooting required

@vohmar vohmar added the EPP-Proxy Issue in the EPP Proxy project label Jan 29, 2020
@maciej-szlosarczyk
Copy link
Contributor

If the node actually crashed, there should be an erl_crash.dump file which contains info about the system, it should be in the same folder as the codebase is located in.

You can use this script to extract readable data from it, based on that info you can where the problem might be:

https://github.com/ferd/recon/blob/master/script/erl_crashdump_analyzer.sh

If there was no crash dump made, which is possible but unlikely, then the other option is to create a process to log information about system stats over time. Judging by the description, it can either leak atoms or ports over time.

@vohmar
Copy link
Contributor Author

vohmar commented Jan 30, 2020

thank you @maciej-szlosarczyk will do exactly that.

@vohmar
Copy link
Contributor Author

vohmar commented Jan 30, 2020

no luck, could not find the crash dump file. So we continue close monitoring and try to figure out how to log better

@vohmar
Copy link
Contributor Author

vohmar commented Feb 26, 2020

it crashed again - 4 weeks of uptime. Looks by memory usage grapf like memory leaks. All run ok for the most of the time and then suddenly last about 72h memory usage started piling up until 100% was reached and then epp-proxy crashed.

@teadur
Copy link
Contributor

teadur commented Feb 26, 2020

If the node actually crashed, there should be an erl_crash.dump file which contains info about the system, it should be in the same folder as the codebase is located in.

You can use this script to extract readable data from it, based on that info you can where the problem might be:

https://github.com/ferd/recon/blob/master/script/erl_crashdump_analyzer.sh

If there was no crash dump made, which is possible but unlikely, then the other option is to create a process to log information about system stats over time. Judging by the description, it can either leak atoms or ports over time.

I think the reason for missing crash dump is because the process is killed by oom, and oom uses SIGKILL not SIGTERM thats why the process has no chance to produce the crash dump.

@maciej-szlosarczyk
Copy link
Contributor

I have a general idea where the issue might be: xmerl library can reserve a lot of memory for itself, and it is never garbage collected. I'll look into replacing with one that does not have that flaw when I get a free moment, but in the meantime I added a logger that should show us if the memory keeps increasing over time or just spikes at one moment.

@vohmar
Copy link
Contributor Author

vohmar commented Oct 15, 2020

looks ok - have not had such issue for some time.

@vohmar vohmar closed this as completed Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EPP-Proxy Issue in the EPP Proxy project
Projects
None yet
Development

No branches or pull requests

4 participants