Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VM reentry limit #5135

Closed
wants to merge 1 commit into from
Closed

Add VM reentry limit #5135

wants to merge 1 commit into from

Conversation

nikic
Copy link
Member

@nikic nikic commented Jan 31, 2020

This adds a zend.vm_reentry_limit option to fix https://bugs.php.net/bug.php?id=64196 and the many linked related bugs.

The VM reentry limit is intentionally named somewhat obscurely: It is not a direct limit on recursion. It only counts the number of VM reentries, because that's what can eventually lead to a stack overflow.

This means that unbounded recursion is still supported just fine, and this affects only some specific cases, like recursion through __destruct(), where we don't support unbounded recursion for technical reasons. As such, applications should never actually hit this limit, even if they make heavy use of recursion (AST processing etc.)

The only exception are extension that for VM reentry for every single function call. Those extensions should probably either increase or disable this limit. Though there's also work underway from APM vendors to add the necessary engine support, so they don't need to force VM reentry anymore.

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine, except for one missing EG(vm_reentry_count)--

Zend/zend_vm_execute.h Outdated Show resolved Hide resolved
@@ -175,6 +175,7 @@ ZEND_INI_BEGIN()
STD_ZEND_INI_BOOLEAN("zend.signal_check", "0", ZEND_INI_SYSTEM, OnUpdateBool, check, zend_signal_globals_t, zend_signal_globals)
#endif
STD_ZEND_INI_BOOLEAN("zend.exception_ignore_args", "0", ZEND_INI_ALL, OnUpdateBool, exception_ignore_args, zend_executor_globals, executor_globals)
STD_ZEND_INI_ENTRY("zend.vm_reentry_limit", "1000", ZEND_INI_ALL, OnUpdateLong, vm_reentry_limit, zend_executor_globals, executor_globals)
Copy link
Contributor

@TysonAndre TysonAndre Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would a small minimum value make sense?

Right now, it's possible to set it to 0, which would prevent __destruct (etc) from being called at all, e.g. during request cleanup.

Values such as 0 may lead to false positive bug reports from users who misconfigured it, e.g. when setting this when debugging other issues and misreading the documentation.

LGTM other than the question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this tricky especially with extensions that overwrite zend_execute_ex the limit is certainly going to be hit by larger frameworks and template engines regularly, whereas for the goto vm mode it will never go very deep.

I believe 1000 could be too small.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another suggestion: adding some form of logging (of a severity not catchable by set_error_handler, because the error handler itself is re-entering?) before throwing an exception might help avoid confusion with catch blocks that don't perform any logging.

Without the additional logging, there would be no obvious reason why recursive AST processing would potentially silently fail to process the entire AST on some system configurations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensions that overwrite zend_execute_ex should set a higher limit for this ini setting, or disable it entirely. A higher limit doesn't really make sense for a default config (probably a much lower one would also do).

Copy link
Contributor

@beberlei beberlei Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic i am thinking in terms of developer experience, this segfault is something users can hardly understand (as you mentioned in your mailing list post). As such i would certainly like to not disable it when my extension is active.

With just 1000, all extensions just disable it magically and users run into these segfaults again, which are more likely to happen when you have an extension installed that uses a zend_execute_ex overwrite.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beberlei So first of all, I do hope that your extension (and other APM extensions) will be able to avoid the VM recursion based on the new hooks that Levi & co were working on. But assuming that doesn't pan out (in time), you don't necessarily have to disable the option (that's just the conservative choice, that gives you exactly the behavior you get on PHP 7), you can also bump the limit to a larger value, based on an analysis of stack usage if your extension is enabled. I don't think that it makes sense to increase the default value further (I'd say 1000 is already very conservative), because it's unlikely to be useful without such an extension in play, but increases the chances of missing stack overflows (if you bump to say 10000).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TysonAndre I wouldn't want to introduce a special error handling mechanism just for this functionality -- but possibly it would be better to make this a fatal error instead of an exception? I mainly went with the exception because "I can" and it includes a stack trace that is likely helpful when dealing with infinite recursion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to make this a fatal error instead of an exception

That would be my preference - terminating cleanly instead of throwing an error would avoid a lot of confusing edge cases in internals and for end users. The programs would already be prone to terminating if they recursed that deeply, except for edge cases (e.g. __toString on a deeply nested tree structure).

My other concern was whether this breaks assumptions in opcache or for end users (e.g. certain opcode types will never throw at runtime, e.g. casting to string)

Error handling properly might be an issue

try {
     array_map(...);
} catch (Throwable $e) {
     echo "Caught $e";  // re-enters when calling __toString() and also throws
}

Also, if you hit the recursion limit, then __destruct() can't get called when an object is freed, which breaks other assumptions you might have about unserialize(), or assumptions users make about RAII or other patterns (e.g. connection or memory leaks)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic yes, i have tested levis new API extensively already and it will make this problem go away.

php.ini-development Outdated Show resolved Hide resolved
@dstogov
Copy link
Member

dstogov commented Apr 14, 2020

This will affect xdebug, xhprof and other debuggers/tracers/profilers that override execute_ex().

@beberlei
Copy link
Contributor

@dstogov @morrisonlevi and @SammyK are working to propose a new insturmentation API that would make execute_ex overwrites obsolete for many use-cases and would allow this re-entry limit to work: SammyK@21219b6

Extensions that still need to override execute_ex can incresae the VM entry limit on MINIT.

@cmb69
Copy link
Member

cmb69 commented Dec 28, 2021

Well, the observer API is available as of PHP 8.0.0, so what's the status here? It would be good to finally fix that bug. @krakjoe, thoughts?

@ramsey
Copy link
Member

ramsey commented May 31, 2022

What's the status here? Is this something we can get ready to merge to master for 8.2?

; This only affects recursion through magic method calls and similar mechanism.
; Some profiling, debugging or APM extensions might make this limit apply to plain
; recursion as well, in which case you may wish to raise it.
;zend.vm_reentry_limit = 1000
Copy link
Contributor

@mvorisek mvorisek Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following example https://3v4l.org/jJD62 results with segfault as well. No magic method used.

Does this PR fixes is as well and why does it happen in the first place? Does it mean array_map use the system stack, but when recursing php functions only, no system stack is used?

PHP does already support fibers. What about creating a Fiber internally on the start and unrolling to it on every like 1000th reentry so the system stack is never exhausted (always kept below 1000)?

This way, we do not need any php.ini and unlimited recursion will be supported for absolutely everything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When PHP code is called from C code (internal functions such as array_map() are implemented in C), the VM is re-entered by calling execute_ex(). If that happens way too often (which is almost always a programming error; infinite recursion), you get a stack overflow error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think Fiber can be used to fix that?

To level of my knowledge, it can solve exactly the issue - fiber (created on php start) can restore some shallow C stack once some reentry count is reached - but still access/modify the latest VM state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that you can "just" unroll a machine stack.

Copy link
Contributor

@mvorisek mvorisek Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a working demo - https://3v4l.org/0Jjrr

I don't think that you can "just" unroll a machine stack.

this is what Fiber does - replace machine stack, thus if we replace the current machine stack with shallower machine stack (which we can easily construct on the start), we effectively "unroll" the working machine stack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the demo and I can run it (with 2G memory_limit) with 1M iterations/C nested calls.

@cmb69 can you please test it on your side as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mvorisek you are asking to rewrite a core architecture of zend engine, its not as easy as you make it sound. A call to an internal fn must go through the internal (and not the userland) execution handler and this causes the stack to increase. Same for PHP extensions that use an old mechanism for instrumentation like NewRelic for example, they even require this for userland calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I have to add that I don't see the point in this. If a stack overflow happens due to indirect execution, you likely made a programming error, and ideally want to have that reported, which is exactly what this PR is about.

If there are still concerns regarding extensions overriding zend_execute_ex() this could be catered to, possibly by ignoring the zend.vm_reentry_limit in that case.

The only thing we might need to consider are fibers; it might be reasonable to have an own vm_reentry_count for each fiber.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is the machine stack is limited to about 2k VM reentries [1]. One can argue this is "likely made a programming error", but for deeply recursive algorithms with magic methods, any hard limit can be too limiting.

So I propose to use C fiber to overcome the C/host stack limitation by using a machine stack "unroll" as described /w PoC above. This solution does not need any arbitrary limit and if infine recurse will occur, the execution will stop on standard OOM.

I think this solves the real "VM reentry" limitation and it does not introduce any "for some cases recursion limit" [2]. I strongly belive, from an user perspective, any function invocation should behave the same, eg. behaviour/limitation should be the same for php function called directly vs. called from C function.

[1] see PR description, "VM reentry" happen when "When PHP code is called from C code (internal function"
[2] citing from PR description:

The` VM reentry limit is intentionally named somewhat obscurely: It is not a direct limit on recursion. It only counts the number of VM reentries, because that's what can eventually lead to a stack overflow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose

This sounds like a may months project and will break many extensions.

For now we should use this to avoid the segfaults.

@arnaud-lb
Copy link
Member

How feasible/practical would it be to limit in terms of stack usage instead of VM reentries ?

Limiting in terms of stack usage would have some benefits:

  • It would be independent of the stack cost of reentering the VM
  • Extensions that still override execute_ex will not have to disable the setting
  • This could be used in recursive C code as well. For example in Try to avoid recursive compilation of binary ops #8315 (although not a perfect solution, this would be more user friendly)

V8 appears to have opted for this solution. They take note of the stack start at some point, and compare it to the current stack position at strategical places (e.g. on VM reentry, or in recursive code).

I think that this would have comparable runtime overhead.

From reading V8 code, this seems doable on Windows/Linux/FreeBSD/MacOS and others. The trickiest part seems to be to get the start of the stack, but we can fallback to taking the current stack position early in the SAPI entry, for similar results.

We can get the current stack position with __builtin_return_address(0), which seems supported on all platforms by GCC

@cmb69
Copy link
Member

cmb69 commented Jul 8, 2022

Interesting idea, @arnaud-lb! I assume that the runtime overhead very much depends on where we do these checks; if we're doing them only on VM entry, it should be roughly the same. If we add them to other places as well, that may make a huge difference (although I don't think that adding them during compilation would be a real issue). Might be worth closer investigation.

@arnaud-lb arnaud-lb mentioned this pull request Jul 22, 2022
11 tasks
@Girgias
Copy link
Member

Girgias commented Feb 5, 2023

@arnaud-lb is this PR still relevant now that #9104 was merged?

@dstogov
Copy link
Member

dstogov commented Feb 6, 2023

@arnaud-lb is this PR still relevant now that #9104 was merged?

I think #9104 already solved the problem in a better way (may be a bit incompletely).

@Girgias
Copy link
Member

Girgias commented Feb 6, 2023

@arnaud-lb is this PR still relevant now that #9104 was merged?

I think #9104 already solved the problem in a better way (may be a bit incompletely).

Let's close this then, and if a related issue arises again we can always go back to this. :)

@Girgias Girgias closed this Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants