Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ejabberd_zlib_drv: avoid overflow when OOM #1134

Merged
merged 1 commit into from
Jan 3, 2017
Merged

Conversation

msantos
Copy link
Contributor

@msantos msantos commented Dec 24, 2016

No description provided.

Copy link
Contributor

@michalwski michalwski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @msantos! I have 2 questions:

  1. Could you describe briefly when does this help and how is it suppose to behave in case of OOM?
  2. See the question in code.

@@ -42,6 +42,40 @@ typedef struct {
z_stream *i_stream;
} ejabberd_zlib_data;

/* Wrappers around driver_alloc() that check */
/* for OOM. */
void erl_exit(int n, char* v, ...) { abort(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if abort is the best choice here. How about driver_failure?

@msantos
Copy link
Contributor Author

msantos commented Jan 2, 2017

  1. driver_alloc() is basically a wrapper around the system malloc(). malloc() may return NULL under certain conditions leading to a buffer overflow. This change will crash the VM if this occurs.

  2. Use of erl_exit()/erts_exit() depend on the erlang version. abort() is a workaround from the fast_tls driver since mongooseim supports erlang 17, 18 and 19. Upstream I added a compile time check:

processone/ezlib@e7751e5

I can add this check in another PR if wanted.

driver_failure() is reasonable if the system will deal gracefully with having the zlib driver unloaded in a running system. Calling erts_exit() follows what the erlang inet driver does in OOM conditions.

  1. Any interest in upstreaming the local patches and using the ezlib driver?

@michalwski
Copy link
Contributor

Thanks for the explanation, now I understand your work better. Did run into the issue on your system? Is it somehow connected to ZLib bomb attack? https://en.wikipedia.org/wiki/Zip_bomb.

@msantos
Copy link
Contributor Author

msantos commented Jan 2, 2017 via email

@michalwski
Copy link
Contributor

Regarding the zlib bomb, we are already trying to prevent such an attack. Take a look at the size_limit parameter. It specifies the maximum size of unzipped data. The size is configurable so you can set it to your needs. Do you think this would help?

@msantos
Copy link
Contributor Author

msantos commented Jan 2, 2017

Yes, that option is a great way of preventing zip bombs. Unfortunately there are many ways to force a system to run out of memory. Checking for memory allocation failure prevents undefined behaviour in case of error.

@michalwski
Copy link
Contributor

Checking for memory allocation failure prevents undefined behaviour in case of error.

Right, I have only one issue with it - stopping the entire VM. Maybe, it would be ok, to just stop the port driver and linked ejabberd_c2s process, so only one user is affected. This should free some memory also.

@msantos
Copy link
Contributor Author

msantos commented Jan 2, 2017

Assuming the inet or fast_tls drivers don't stop the VM first, the behaviour today is:

  • on linux, by default the OOM killer terminates the VM by sending a SIGKILL

  • on other systems or if the OOM killer is disabled on linux, anything can happen but probably the VM will segfault

So it may be possible to unload the driver and have a functioning system but the next call into the network may force the VM to abort anyway (and hopefully be restarted by heart).

If we decide to go with driver_failure(), it will be difficult to simulate an OOM condition to test if everything continues to work. One idea might be to add a "fail" command to the driver that simply returns driver_failure() to force the driver to unload during the test run.

@michalwski
Copy link
Contributor

Ok, this sounds reasonable. Thanks for explanations and your contribution.

@michalwski michalwski merged commit 7de6ece into esl:master Jan 3, 2017
@michalwski
Copy link
Contributor

One more thing. Could you create another PR regarding the Erlang version check?

Use of erl_exit()/erts_exit() depend on the erlang version. abort() is a workaround from the fast_tls driver since mongooseim supports erlang 17, 18 and 19. Upstream I added a compile time check:
processone/ezlib@e7751e5
I can add this check in another PR if wanted.

@msantos msantos deleted the alloc branch January 4, 2017 23:13
msantos added a commit to msantos/MongooseIM that referenced this pull request Jan 9, 2017
Set CFLAGS in rebar.config.script based on the emulator version as
discussed in:

esl#1134
msantos added a commit to msantos/MongooseIM that referenced this pull request Jan 9, 2017
Set CFLAGS in rebar.config.script based on the emulator version as
discussed in:

esl#1134
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants