-
-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: support for Fibers #171
Conversation
I don't understand why fiber tests are failing in Docker containers but not when running directly in GHA runners. Adjusting the stack size with Do you have any idea of what can be the difference on @withinboredom? |
I'll have to take a gander in the morning. At least for me, it looks like Docker has gone insane...
|
I'm an idiot and forgot |
.github/workflows/docker.yml
Outdated
@@ -136,9 +136,10 @@ jobs: | |||
if: '!matrix.qemu' | |||
continue-on-error: ${{fromJson(needs.prepare.outputs.push)}} | |||
run: | | |||
docker run --platform=${{matrix.platform}} --rm \ | |||
ulimit -s | |||
docker run --ulimit stack=$(ulimit -s)000 --platform=${{matrix.platform}} --rm \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed a positively ridiculous stack size to get the tests to pass in my local Docker.
docker run --ulimit stack=$(ulimit -s)000 --platform=${{matrix.platform}} --rm \ | |
docker run --ulimit stack=512000000 --platform=${{matrix.platform}} --rm \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very curious how PHP fibers work, under the hood, beyond my rudimentary knowledge. It'd be a really interesting blog post. 🤔
Maybe there are some changes to either Frankenphp or PHP itself that can make this better. A 5mb stack is pretty huge... Especially the zts version, where instead of trying to do everything in a single stack, could just spawn a thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this with fresh morning eyes. I'm going to write that blog post. I really want to understand what is happening in the stack here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with the value you suggested (last commit), it doesn't work :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing changed except the time of day and its failing locally with the same numbers I gave you (ran on the same commit). This makes no sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it fail the first few times and then pass afterward? There's some kind of race going on and it fails more than it passes, but ran it 10 times, it only passed 3 times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's flaky. Even on GHA sometimes test pass, sometimes no.
Could you try to compile PHP with --disable-fiber-asm
to see if this improves the situation?
06c62b0
to
49ed71f
Compare
So @dunglas, I was digging around
Maybe frankenphp needs to implement a go-specific fiber implementation? There's also the |
From: php/php-src#9104 That PR is amazing information about how the stack works. Something that Go might be screwing up:
Emphasis mine. I don't think the stack will be contiguous with Go involved. |
Thanks for the pointer, this looks interesting! Passing |
Yep, saw that as well. I wonder if we're running into the same situation as golang/go#39411 (comment) and the PHP fiber implementation is somehow messing the main stack? |
I added |
One thing sticks out to me:
Is Go doing GC while executing a script? Am I reading that right? |
GC is a red herring. I disabled it, and analysed a bunch of stack traces. There doesn't seem to really be much in common which makes me think this is on PHP's side, not Go's. Oh, and |
There's some asm in Go that switches to |
From cgocall.go:
hmmmm.... |
I've managed to figure out how to reproduce it reliably while a debugger is running. Yet stepping through the code is enough to work around the issue. This is certainly a race condition:
I've narrowed down the issue to somewhere in In fact, the fact that it requires a reboot (unless you're running in Docker!) almost makes it smell like a kernel bug, but I doubt it. Probably some little chunk of memory gets cached in a CPU cache line that makes it work after the first requests, but switching namespaces in the kernel gets it evicted. |
Requires a custom build of PHP, but I can no longer reproduce it locally...
|
@withinboredom thanks for your in-depth research on this topic, I very much appreciate it!! Unfortunately, I don't think that defining Passing the |
:sigh: ok, I have another lead, but that was the most promising. |
Funny enough, today I did run into it with the asm disabled locally (outside of tests). Anyway, I think I know what is going on. At least I'm fairly certain. This bug should affect ALL cgo programs, if I'm actually understanding things correctly. Essentially the issue is that we're passing a pointer from C to Go that contains pointers to Go structures... in Here's the thing. These pointers are owned by C structs, not Go structs. Thus they crash quite spectacularly when they don't get rewritten by I think this can be fixed by having a hash in Go that the C structs then reference (main context / current context). Essentially:
or something like that. Here's the thing, once a few requests are served, the stack stays pretty stable, thus that is why it is hard to reproduce on 'real servers' vs. these tests which keep the stack really small and thus easy to grow, especially the The moral of the story is that I think we cannot reference Go structs from C structs or vice-versa. |
Thanks for sharing your findings, this looks promising! Maybe are we violating the pointers passation rules in some way? Do you have already identified the offending pointers? Maybe could be avoid passing them from Go to C, or ping them? Also, we should set |
Ah, yep. I guess I accidentally hinted to them. It's the HTTP contexts from Caddy which are stored in the current/main contexts in Frankenphp's C code. |
I just reviewed the code again, and I don't think that we store any Go pointer in C memory. We use runtime/cgo.Handle (which is designed to not have to pass Go pointers to C). Do you have any GDB backtrace? (I still cannot reproduce locally, most likely because my CPU an Apple M1 which seems unaffected by this bug). |
I went down quite a rabbit hole trying to find a bad pointer. Handy bit of C: static void debug_log(char *data) {
pid_t id = gettid();
fprintf(stderr, "tid %d: %s", id, data);
}
This is beyond my expertise to resolve now, as I'm not intimately familiar with Go and assembly. Here's a minimum diff to get tests to run on this branch (though they still fail due to not outputting anything) without crashing, uncommenting either one of those functions will cause the crash when running tests. Apparently, this may be amd64 specific. I can reproduce this in an ubuntu 22.04 VM without fail, or a docker container. On my baremetal Linux laptop (pop-os 22.04), I cannot reproduce it reliably (if at all). Now I'm wondering if this is ubuntu specific... but I'll have to test that later, it's 1am. cc: @krakjoe |
@withinboredom here is why: golang/go#62130 (comment) We'll likely have to patch Go to add support for Fibers. |
I wonder if we can do this in C.
|
Huh, dunno how I missed this, but |
Regarding |
Double-checked and verified. Running 8.3-dev fwiw. |
In fact, building with It looks like when it tries to run |
Doing this: static void disable_timers() {
#ifdef ZEND_MAX_EXECUTION_TIMERS
// unset default timeout
// TODO: add support for max_input_time
zend_unset_timeout();
#endif
}
static void restart_timers() {
#ifdef ZEND_MAX_EXECUTION_TIMERS
// Reset default timeout
// TODO: add support for max_input_time
zend_set_timeout(INI_INT("max_execution_time"), 0);
#endif
} and wrapping calls to go that happen during scripts also seems to work. It'd be handy if |
Turns out to be that PHP isn't installing the signal handlers properly. I'm opening a PR over there to get it fixed in 8.2/8.3 -- well, hopefully. I've never opened a PR over there. |
I can assist you and review if you need any help (it's probably related to changes I've done to PHP). Feel free to contact me in DM! |
I was going to ask if you'd tried toggling whether ZEND_SIGNALS was (en|dis)abled. |
b5f30d4
to
f0f4eed
Compare
Fixed by #1137. |
Continuation of #99. Closes #46, closes #99.