Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling on Cygwin for exceptions that occur during exception handling on an alt stack #83

Closed
wants to merge 5 commits into from

Conversation

embray
Copy link
Collaborator

@embray embray commented Mar 26, 2018

This is a rough idea for how to fix #77. Actually it seems to work pretty reliably but I admit I haven't fully worked out the implications for continuing to do anything with Cygwin in this environment. It's probably pretty unsafe, but this is still better than the current situation (where the OS just quietly terminates the process, and never even sets the exit code in a way that Cygwin can keep track of).


/* Additional platform-specific implementation code */
#if defined(__CYGWIN__) && defined(__x86_64__)
#include "implementation_cygwin.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find an #include in the middle of the code bad style. Can you put it near the top of the file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, though though the file's contents depends on some inline functions I added above. Without it I get complaints about missing declarations--in that case it might almost be better to add an implementation.h and move the declarations there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without it I get complaints about missing declarations

Just add declarations in the beginning of implementation.c.

in that case it might almost be better to add an implementation.h

My feeling is that header files are mostly used to define a public API, which is not the case here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not necessarily true at all--headers are just so that declarations an be shared between compilation units, and doesn't necessarily have anything to do with public APIs. It's not unusual to have headers that are used internally for build purposes, but are never "installed".



/* Additional platform-specific implementation code */
#if defined(__CYGWIN__) && defined(__x86_64__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why defined(__x86_64__)? Even if you need that, it makes more sense to put that check in implementation_cygwin.h

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree--I'll change that.

@@ -0,0 +1,61 @@
/* Cygwin-specific implementation details */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a .c file instead of .h

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my original call as well--it was a ".c" file to start with and I had #include "implementation_cygwin.c". However, I found out the hard way that in this case re-running setup.py didn't rebuild implementation.o, which is annoying. Open to ideas.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found out the hard way that in this case re-running setup.py didn't rebuild implementation.o, which is annoying.

That issue is independent of the filename.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would think, but when I changed the filename that seemed to fix it. Maybe it was something else.
I think that, rather than doing an "include" I'd rather just make this a separate (optional) compilation unit. However, that will require adding a couple header files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that, rather than doing an "include" I'd rather just make this a separate (optional) compilation unit. However, that will require adding a couple header files.

To me, that is overkill to fix a simple dependency problem. Cython and distutils allow adding arbitrary files as dependencies. I don't recall how that works though.

@@ -107,6 +108,60 @@ static inline void reset_CPU(void)
}


/* Reset all signal handlers and the signal mask to their defaults. */
static inline void sig_reset_defaults(void) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that calls to sig_reset_defaults and sigdie_for_sig always appear together, why make them separate functions? They are logically related because I need to reset the signal handlers for the kill in sigdie to work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay--I was sort of split on that because sig_reset_defaults looked like it could be useful on its own as well. But in practice it only ended being used in this one place. Alternatively I could keep sig_reset_defaults as a separate routine, but place a call to it at the beginning of sigdie.

@@ -344,6 +364,7 @@ static void setup_alt_stack(void)
ss.ss_size = sizeof(alt_stack);
ss.ss_flags = 0;
if (sigaltstack(&ss, NULL) == -1) {perror("sigaltstack"); exit(1);}
PLATFORM_SETUP_ALT_STACK;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like PLATFORM_SETUP_ALT_STACK.

I would rather write this as

#ifdef __CYGWIN__
    cygwin_setup_alt_stack();
#endif

just to be explicit that something is happening here only on Cygwin.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was in theory other platforms might add something here, but I'm just as happy to do the above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if multiple platforms need to do something, I would prefer platform-specific #ifdefs.

@jdemeyer
Copy link
Collaborator

It would be interesting to know what happens when you don't call setup_alt_stack at all. For all we know, that might just work.

@embray
Copy link
Collaborator Author

embray commented Mar 27, 2018

For all we know, that might just work.

No, it won't. Cygwin's behavior here is in line with Linux--if a stack overflow occurs it actually disables any non-default SIGSEGV handler unless we have an alternative stack.

@embray
Copy link
Collaborator Author

embray commented May 1, 2018

Going to see if I can get this cleaned up. After re-reading this, it probably does make sense to still have a separate implementation_cygwin.{c,h} since there's enough code in it that it would be a lot of noise to include in the main implementation.c.

embray added 3 commits May 2, 2018 23:25
when handling an exception occurring during signal handling on an
alternate stack

as on other systems, this basically just prints an error and exits with
an appropriate return code (128 + signal number); it also attempts to
print a stack trace but this currently fails on Cygwin, and we can live
with that for now
@embray
Copy link
Collaborator Author

embray commented May 2, 2018

Finally got around to updating this. I think it will be a bit better now.

We still need to add CI for Cygwin, and I can do that. For now you'll have to I guess take my word for it that this works.

@embray
Copy link
Collaborator Author

embray commented May 2, 2018

I moved the call to sig_reset_defaults inside sigdie_for_sig--I agree it doesn't make sense to not call them together. But they still sort of make sense to exist as separate functions.

@jdemeyer
Copy link
Collaborator

jdemeyer commented May 2, 2018

The __version__ stuff that you added is breaking on Python 2.7 according to Travis CI.

@embray
Copy link
Collaborator Author

embray commented May 3, 2018

Oops, I don't think I meant to add that stuff here at all. I think they were just uncommitted changes that I accidentally git ci -a'd

@embray
Copy link
Collaborator Author

embray commented May 3, 2018

Ugh. It might be unrelated to this, but now I'm getting a hang when I run make check that didn't happen before I rebased on current master.

@embray
Copy link
Collaborator Author

embray commented May 3, 2018

Nevermind, I think might be user error. However, now I'm getting a new failure:

File "src/cysignals/signals.pyx", line 181, in signals.pyx
Failed example:
    init_cysignals()
Expected:
    <cyfunction python_check_interrupt at ...>
Got:
    <built-in function python_check_interrupt>

@embray
Copy link
Collaborator Author

embray commented May 3, 2018

Still user/environment error. But now I'm back to the tests hanging in src/cysignals/tests.pyx. I'm going to try going back to the master branch instead of mine and see if it's still a problem.

@embray
Copy link
Collaborator Author

embray commented May 3, 2018

This is weird: make check currently runs make check-prefix and then make check-user. If I, starting from a clean repository, run both of those targets in order, manually, everything works. However, if I just run make check, again from a clean repo, it hangs in the check-user-doctest target on:

export PYTHONUSERBASE="`pwd`/tmp/user" && ulimit 2>/dev/null -s 1024; python -B rundoctests.py src/cysignals/*.pyx
Doctesting 5 files.
src/cysignals/alarm.pyx
src/cysignals/pselect.pyx
src/cysignals/pysignals.pyx
src/cysignals/signals.pyx
src/cysignals/tests.pyx

I think some of these makefile rules really need to be cleaned up in general, but it's disconcerting that something about how the makefile is written can actually cause the tests to hang. So I need to keep looking into it. This behavior seems to be deterministic though.

@embray
Copy link
Collaborator Author

embray commented May 3, 2018

Alright. After more extensive testing over lunch, it looks like this might not be entirely deterministic after all, and maybe it has nothing to do with the different make targets as my initial testing suggested. So now I just need to find out exactly what test it's hanging on...

@embray
Copy link
Collaborator Author

embray commented May 3, 2018

Okay. When it does hang it seems to be happening in test_interrupt_bomb. This is hard to reproduce, but I suspect that's what happening is that the "Some time later" is too short for Cygwin, considering the additional overhead of signal handling there, and the SIGABRT is getting lost in the shuffle. I don't know that that should happen, and it might indicate a bug in Cygwin. But I don't think there's anything to fix here other than to make the test a little more robust somehow (i.e. ensure it can't block forever).

@embray
Copy link
Collaborator Author

embray commented May 3, 2018

Anyways, I don't think the hang on this test is related to this issue otherwise.

@jdemeyer
Copy link
Collaborator

jdemeyer commented May 9, 2018

When it does hang it seems to be happening in test_interrupt_bomb.

Interesting that you get a hang, that indicated that the SIGABRT is completely ignored. That seems to me like the least likely outcome (I could understand a hard crash).

I would be interested to see the output of this test when cysignals was configured with ./configure --enable-debug.

Does it help if you increase the delay in

signal_after_delay(SIGABRT, base_delay + 10*n + 1000)

@jdemeyer
Copy link
Collaborator

jdemeyer commented May 9, 2018

Note that the test_interrupt_bomb is really about the interaction between the many SIGINTs. The SIGABRT is just meant to terminate the loop, it's not really considered part of the test. So there might be a genuine bug here that the SIGINT interacts with the SIGABRT.

@jdemeyer
Copy link
Collaborator

jdemeyer commented May 9, 2018

Reading the cysignals code, it seems that the case of two different signals arriving together isn't really supported well. So if you get a SIGABRT while handling a SIGINT, the SIGABRT might be swallowed.

This might be fixable, but it's not completely obvious because there are many race conditions to think about. Even then, the SIGABRT would occur outside of the sig_on and would just crash the test.

So I should probably change the SIGABRT mechanism in this test to some delay.

@embray
Copy link
Collaborator Author

embray commented May 9, 2018

Reading the cysignals code, it seems that the case of two different signals arriving together isn't really supported well. So if you get a SIGABRT while handling a SIGINT, the SIGABRT might be swallowed.

Yes, that's what I was implying in this comment. It's entirely possible for the SIGABRT to just never be seen by cysignals under such heavy bombardment.

But anyways this discussion might be better continued elsewhere since it's not actually pertinent to this ticket (it just came up here because I didn't start seeing this failure until I most recently tested this ticket).

@jdemeyer jdemeyer closed this May 14, 2018
@embray
Copy link
Collaborator Author

embray commented May 14, 2018

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failures on Cygwin
2 participants