-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call stack exhaustion found via CPython's test_marshal
test suite around October 19th/20th
#5152
Comments
/cc @tiran |
A few bits of info that would help, if you wouldn't mind investigating further on your end:
Nothing major changed in regalloc or codegen between 1.0 and 2.0 that would lead to a large expansion of stackframe size, at least off the top of my head, but if the above investigation shows that we somehow have a much lower limit, we can of course dig in and see what's going on. Thanks! |
I haven't tried reproducing this locally, but I was curious and looked through the failing test's source code a bit. I'm guessing the specific failing test is That test expects to be able to recurse over a data structure nested about 2,000 levels deep, and expects So I guess that makes this a "quantitative" recursion bug. |
I tried to download the last successful build's copy of import marshal
head = last = []
MAX_MARSHAL_STACK_DEPTH = 2000
for i in range(MAX_MARSHAL_STACK_DEPTH - 2):
last.append([0])
last = last[-1]
data = marshal.dumps(head)
new_head = marshal.loads(data) this succeeded:
for all of On an aaarch64 machine v1.0.0 and Basically I unfortunately am unable to easily reproduce, and at least on AArch64 I can see a small difference in stack usage but nothing meaningful. I'd echo what @cfallin said and reiterate @brettcannon if you're able it would be very helpful to reduce this issue to ideally "here's a wasm file and a directory tree and a command that fails." We can probably take it from there but without that it may be unlikely much can be done here. |
Ok with some further testing and hammering I was able to get something. At 29c7de7 it takes 497712 bytes of stack to complete this test. At the next commit, 9715d91, it takes 528752 bytes of stack. The default allowed stack is 524288, which is why this is stack overflowing. The stack required increased by 6%. Of the two co-recursive functions, Overall I think that should fully explain what's going on here. A small increase in stack from I'll leave @jameysharp to conclude whether #4966 should have increased the stack size here or not. I've extracted the single function to this wasm file: extract.wasm.gz where I get:
locally on an x86_64 target. @brettcannon to solve this issue for you in the meantime I believe there's a few options:
We could alternatively consider increasing the default stack size in a platform-specific manner. I believe the current limit is relatively restrictive due to our testing on Windows where the main thread stack was smaller by default than other platforms. Ideally though we could keep consistent platform behavior. |
Thank you @alexcrichton for bisecting this! I too am curious why exactly we use 16 extra bytes in the stackframe in this function. We can look into that a bit; however I think the high-order bit here, actually much more important, is to set expectations/norms around this sort of thing, so I'm going to go ahead and give some thoughts on that. (I don't mean to pick on this particular instance of the issue @brettcannon -- this is just a useful "norm-setting opportunity", so I'll make a general point.) In general, I want to make sure we don't (accidentally or otherwise) fall into a "no regressions ever" expectation. The size of a stackframe (number of spillslots) is implementation-defined, and a Wasm compiler is free to be as efficient or as inefficient as it likes. A test (such as the one linked above) that depends on the stack size fitting just under some implementation-defined limit, given implementation-defined stackframe sizes, is a really brittle test: it could break with any minor version update. In other words, I want to be very clear here first that Wasmtime/Cranelift are perfectly standards-compliant here, and so in a strict sense, this is not a bug. I would consider a "blow up by factor of N" regression a performance bug, but this is not that: it's one extra spillslot. Of course, all else being equal, regressions are bad. Over time the compiler should improve on average, and we would never want to take a change that adds an extra 16 bytes to a stackframe with no other benefit. But that qualification is key: most things have tradeoffs. (In the case here, 4-6% faster compile times, according to the commit message.) The impact of a strict no-regressions policy is that one ends up climbing the hill to a local maximum, and one can never really change the design otherwise: the core algorithms/approach are ossified forever (or a new algorithm/approach needs to go to absolutely heroic lengths to match or exceed the old approach in every single benchmark, and is always "on call" to be "fixed" when new cases are discovered). Investigations of small N% regressions like this one can be useful because they can uncover real problems, or opportunities to fix a case. I'll also defer to @jameysharp whether he wants to dig into this one. I wouldn't be surprised however if it ends up being something like "values processed in slightly different order and spillslot-reuse heuristic now stops searching one step too soon" -- that sort of thing can easily happen in a complex compiler. I'd second @alexcrichton 's suggestion to set either the recursion depth or stack limit to widen the gap between the two a bit -- in other words the most sustainable approach here is to not live right "on the edge" of the limit. I don't want to discourage filing bugs for this sort of thing, as long as they have enough detail to be helpful; but I just want to make sure we're all on the same page re: acceptable outcomes. Happy to discuss this further if anyone would like! |
I agree with everything you say, and I, too, do not wish to have a policy of zero regressions accepted. Personally I feel that this sequence of events for this issue is ok to play out, though. As an end-user you probably aren't that aware of how close to the limits you're running which means that if a release, like this one, bumps you over the limit it looks like a legitimate regression rather than a small amount more stack space. In that sense I very much want to still encourage these bugs being opened so we can investigate and figure out a conclusion. On this particular but I personally see no issue in providing the |
I chatted about this offline with @cfallin and others. I'm a little puzzled at how #4966 could have impacted stack frame size. In particular, Chris says regalloc2 does a value-merging pre-processing step that sounds to me like it should have made the input to register allocation be the same with or without the PR. Perhaps merging values earlier is enabling other Cranelift optimizations that lead to longer live-ranges? It would certainly be nice to understand the root cause here. But I think @alexcrichton's So I'm closing this issue as I believe it's resolved by #5156. @brettcannon, if that solution isn't sufficient for you, feel free to re-open with more details about what you need. Thank you for your report! |
Thanks everyone for the details and looking into this! Sorry I couldn't react faster to the request for more info as I have been a bit busy with other things. 😅 As for the result, I'm totally happy with the answers. The issue was opened more to make sure we had not found a bug as CPython is somewhat notorious in doing in edge cases of platforms. 😁 I will lower the stack depth cap in CPython under WASI to a lower number (which is the key purpose of the test; to know when we need a lower stack depth so that we can make sure you don't overflow and we can raise an exception instead). |
I have opened python/cpython#98925 to track fixing this on the CPython side. |
Thanks for filing a bug report! Please fill out the TODOs below.
Note: if you want to report a security issue, please read our security policy!
Test Case
test_marshal
from Python's test suite.Steps to Reproduce
Easiest to just look at a GitHub Action run, e.g. https://github.com/tiran/cpython-wasm-test/actions/runs/3342891022/jobs/5535571299 .
Expected Results
The test to at least complete.
Actual Results
Call stack exhaustion which doesn't occur under wasmtime 1.0.1.
https://github.com/tiran/cpython-wasm-test/actions/runs/3294946299/jobs/5432988265#step:7:408
Versions and Environment
Wasmtime version or commit: latest
Operating system: Linux
Architecture: x64
Extra Info
This started occurring on October 22: https://github.com/tiran/cpython-wasm-test/actions/workflows/wasi.yml . As mentioned above, wasmtime 1.0.1 passes without issue and everything was fine until something changed between October 19th @ 22:04 and October 20th @ 22:13 for what https://wasmtime.dev/install.sh installs.
The text was updated successfully, but these errors were encountered: