-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VM reentry limit #5135
Add VM reentry limit #5135
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine, except for one missing EG(vm_reentry_count)--
7d38d0b
to
f6d9fd5
Compare
@@ -175,6 +175,7 @@ ZEND_INI_BEGIN() | |||
STD_ZEND_INI_BOOLEAN("zend.signal_check", "0", ZEND_INI_SYSTEM, OnUpdateBool, check, zend_signal_globals_t, zend_signal_globals) | |||
#endif | |||
STD_ZEND_INI_BOOLEAN("zend.exception_ignore_args", "0", ZEND_INI_ALL, OnUpdateBool, exception_ignore_args, zend_executor_globals, executor_globals) | |||
STD_ZEND_INI_ENTRY("zend.vm_reentry_limit", "1000", ZEND_INI_ALL, OnUpdateLong, vm_reentry_limit, zend_executor_globals, executor_globals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a small minimum value make sense?
Right now, it's possible to set it to 0, which would prevent __destruct (etc) from being called at all, e.g. during request cleanup.
Values such as 0 may lead to false positive bug reports from users who misconfigured it, e.g. when setting this when debugging other issues and misreading the documentation.
LGTM other than the question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this tricky especially with extensions that overwrite zend_execute_ex
the limit is certainly going to be hit by larger frameworks and template engines regularly, whereas for the goto vm mode it will never go very deep.
I believe 1000 could be too small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another suggestion: adding some form of logging (of a severity not catchable by set_error_handler, because the error handler itself is re-entering?) before throwing an exception might help avoid confusion with catch
blocks that don't perform any logging.
Without the additional logging, there would be no obvious reason why recursive AST processing would potentially silently fail to process the entire AST on some system configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extensions that overwrite zend_execute_ex should set a higher limit for this ini setting, or disable it entirely. A higher limit doesn't really make sense for a default config (probably a much lower one would also do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikic i am thinking in terms of developer experience, this segfault is something users can hardly understand (as you mentioned in your mailing list post). As such i would certainly like to not disable it when my extension is active.
With just 1000, all extensions just disable it magically and users run into these segfaults again, which are more likely to happen when you have an extension installed that uses a zend_execute_ex overwrite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beberlei So first of all, I do hope that your extension (and other APM extensions) will be able to avoid the VM recursion based on the new hooks that Levi & co were working on. But assuming that doesn't pan out (in time), you don't necessarily have to disable the option (that's just the conservative choice, that gives you exactly the behavior you get on PHP 7), you can also bump the limit to a larger value, based on an analysis of stack usage if your extension is enabled. I don't think that it makes sense to increase the default value further (I'd say 1000 is already very conservative), because it's unlikely to be useful without such an extension in play, but increases the chances of missing stack overflows (if you bump to say 10000).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TysonAndre I wouldn't want to introduce a special error handling mechanism just for this functionality -- but possibly it would be better to make this a fatal error instead of an exception? I mainly went with the exception because "I can" and it includes a stack trace that is likely helpful when dealing with infinite recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better to make this a fatal error instead of an exception
That would be my preference - terminating cleanly instead of throwing an error would avoid a lot of confusing edge cases in internals and for end users. The programs would already be prone to terminating if they recursed that deeply, except for edge cases (e.g. __toString on a deeply nested tree structure).
My other concern was whether this breaks assumptions in opcache or for end users (e.g. certain opcode types will never throw at runtime, e.g. casting to string)
Error handling properly might be an issue
try {
array_map(...);
} catch (Throwable $e) {
echo "Caught $e"; // re-enters when calling __toString() and also throws
}
Also, if you hit the recursion limit, then __destruct() can't get called when an object is freed, which breaks other assumptions you might have about unserialize(), or assumptions users make about RAII or other patterns (e.g. connection or memory leaks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikic yes, i have tested levis new API extensively already and it will make this problem go away.
This will affect xdebug, xhprof and other debuggers/tracers/profilers that override execute_ex(). |
@dstogov @morrisonlevi and @SammyK are working to propose a new insturmentation API that would make execute_ex overwrites obsolete for many use-cases and would allow this re-entry limit to work: SammyK@21219b6 Extensions that still need to override execute_ex can incresae the VM entry limit on MINIT. |
ac8f36b
to
df12728
Compare
Well, the observer API is available as of PHP 8.0.0, so what's the status here? It would be good to finally fix that bug. @krakjoe, thoughts? |
What's the status here? Is this something we can get ready to merge to master for 8.2? |
; This only affects recursion through magic method calls and similar mechanism. | ||
; Some profiling, debugging or APM extensions might make this limit apply to plain | ||
; recursion as well, in which case you may wish to raise it. | ||
;zend.vm_reentry_limit = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following example https://3v4l.org/jJD62 results with segfault as well. No magic method used.
Does this PR fixes is as well and why does it happen in the first place? Does it mean array_map
use the system stack, but when recursing php functions only, no system stack is used?
PHP does already support fibers. What about creating a Fiber internally on the start and unrolling to it on every like 1000th reentry so the system stack is never exhausted (always kept below 1000)?
This way, we do not need any php.ini and unlimited recursion will be supported for absolutely everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When PHP code is called from C code (internal functions such as array_map()
are implemented in C), the VM is re-entered by calling execute_ex()
. If that happens way too often (which is almost always a programming error; infinite recursion), you get a stack overflow error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think Fiber can be used to fix that?
To level of my knowledge, it can solve exactly the issue - fiber (created on php start) can restore some shallow C stack once some reentry count is reached - but still access/modify the latest VM state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that you can "just" unroll a machine stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a working demo - https://3v4l.org/0Jjrr
I don't think that you can "just" unroll a machine stack.
this is what Fiber does - replace machine stack, thus if we replace the current machine stack with shallower machine stack (which we can easily construct on the start), we effectively "unroll" the working machine stack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the demo and I can run it (with 2G memory_limit) with 1M iterations/C nested calls.
@cmb69 can you please test it on your side as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvorisek you are asking to rewrite a core architecture of zend engine, its not as easy as you make it sound. A call to an internal fn must go through the internal (and not the userland) execution handler and this causes the stack to increase. Same for PHP extensions that use an old mechanism for instrumentation like NewRelic for example, they even require this for userland calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I have to add that I don't see the point in this. If a stack overflow happens due to indirect execution, you likely made a programming error, and ideally want to have that reported, which is exactly what this PR is about.
If there are still concerns regarding extensions overriding zend_execute_ex()
this could be catered to, possibly by ignoring the zend.vm_reentry_limit
in that case.
The only thing we might need to consider are fibers; it might be reasonable to have an own vm_reentry_count
for each fiber.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is the machine stack is limited to about 2k VM reentries [1]. One can argue this is "likely made a programming error", but for deeply recursive algorithms with magic methods, any hard limit can be too limiting.
So I propose to use C fiber to overcome the C/host stack limitation by using a machine stack "unroll" as described /w PoC above. This solution does not need any arbitrary limit and if infine recurse will occur, the execution will stop on standard OOM.
I think this solves the real "VM reentry" limitation and it does not introduce any "for some cases recursion limit" [2]. I strongly belive, from an user perspective, any function invocation should behave the same, eg. behaviour/limitation should be the same for php function called directly vs. called from C function.
[1] see PR description, "VM reentry" happen when "When PHP code is called from C code (internal function"
[2] citing from PR description:
The` VM reentry limit is intentionally named somewhat obscurely: It is not a direct limit on recursion. It only counts the number of VM reentries, because that's what can eventually lead to a stack overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose
This sounds like a may months project and will break many extensions.
For now we should use this to avoid the segfaults.
How feasible/practical would it be to limit in terms of stack usage instead of VM reentries ? Limiting in terms of stack usage would have some benefits:
V8 appears to have opted for this solution. They take note of the stack start at some point, and compare it to the current stack position at strategical places (e.g. on VM reentry, or in recursive code). I think that this would have comparable runtime overhead. From reading V8 code, this seems doable on Windows/Linux/FreeBSD/MacOS and others. The trickiest part seems to be to get the start of the stack, but we can fallback to taking the current stack position early in the SAPI entry, for similar results. We can get the current stack position with |
Interesting idea, @arnaud-lb! I assume that the runtime overhead very much depends on where we do these checks; if we're doing them only on VM entry, it should be roughly the same. If we add them to other places as well, that may make a huge difference (although I don't think that adding them during compilation would be a real issue). Might be worth closer investigation. |
@arnaud-lb is this PR still relevant now that #9104 was merged? |
I think #9104 already solved the problem in a better way (may be a bit incompletely). |
Let's close this then, and if a related issue arises again we can always go back to this. :) |
This adds a
zend.vm_reentry_limit
option to fix https://bugs.php.net/bug.php?id=64196 and the many linked related bugs.The VM reentry limit is intentionally named somewhat obscurely: It is not a direct limit on recursion. It only counts the number of VM reentries, because that's what can eventually lead to a stack overflow.
This means that unbounded recursion is still supported just fine, and this affects only some specific cases, like recursion through
__destruct()
, where we don't support unbounded recursion for technical reasons. As such, applications should never actually hit this limit, even if they make heavy use of recursion (AST processing etc.)The only exception are extension that for VM reentry for every single function call. Those extensions should probably either increase or disable this limit. Though there's also work underway from APM vendors to add the necessary engine support, so they don't need to force VM reentry anymore.