Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on windows, zdir test fails #1604

Closed
bbdb68 opened this issue Jan 15, 2017 · 19 comments
Closed

on windows, zdir test fails #1604

bbdb68 opened this issue Jan 15, 2017 · 19 comments

Comments

@bbdb68
Copy link
Contributor

bbdb68 commented Jan 15, 2017

with vs 2015, in relase only. It runs in debug mode, here (line 998):
// wait for notification of the file being added
char *path;
int rc = zsock_recv (watch, "sp", &path, &patches);

the recv call return -1.

@bluca
Copy link
Member

bluca commented Jan 15, 2017

-1 from zsock_recv could be:

  1. it didn't receive anything:
    if (!msg)
        return -1;              //  Interrupted
  1. the actor died:
    //  Filter a signal that may come from a dying actor
    if (zmsg_signal (msg) >= 0) {
        zmsg_destroy (&msg);
        return -1;
  1. it didn't receive a pointer:
                    if (zframe_size (frame) == sizeof (void *))
                        *pointer_p = *((void **) zframe_data (frame));
                    else
                        rc = -1;

Are you able to reproduce it through a debugger and check what's happening?

@bbdb68
Copy link
Contributor Author

bbdb68 commented Jan 15, 2017

the actor died

@bluca
Copy link
Member

bluca commented Jan 15, 2017

A 1 signal could also be an error. When running in verbose mode, is there an error printed by the zdir_watch actor? "zdir_watch: ..." (see s_on_command function in zdir.c)

@bbdb68
Copy link
Contributor Author

bbdb68 commented Jan 15, 2017

here the output of the test in verbose mode

Running czmq test 'zdir'...

  • zdir:
    zdir-test-dir/initial_file
    zdir-test-dir/test_abc
    I: 17-01-15 19:03:37 zdir_watch: Command received: TIMEOUT
    I: 17-01-15 19:03:37 zdir_watch: Setting directory poll timeout to 100
    I: 17-01-15 19:03:37 zdir_watch: Successfully set directory poll timeout to 100
    totoI: 17-01-15 19:03:37 zdir_watch: Command received: SUBSCRIBE
    I: 17-01-15 19:03:37 zdir_watch: Subscribing to directory path: zdir-test-dir
    I: 17-01-15 19:03:37 zdir_watch: Successfully subscribed to zdir-test-dir
    I: 17-01-15 19:03:37 zdir_watch: Command received: UNSUBSCRIBE
    I: 17-01-15 19:03:37 zdir_watch: Unsubscribing from directory path: zdir-test-di
    r
    I: 17-01-15 19:03:37 zdir_watch: Successfully unsubscribed from zdir-test-dir
    I: 17-01-15 19:03:37 zdir_watch: Command received: SUBSCRIBE
    I: 17-01-15 19:03:37 zdir_watch: Subscribing to directory path: zdir-test-dir
    I: 17-01-15 19:03:37 zdir_watch: Successfully subscribed to zdir-test-dir
    I: 17-01-15 19:03:38 zdir_watch: Found 1 changes in zdir-test-dir:
    I: 17-01-15 19:03:38 zdir_watch: zdir-test-dir/test_abc created
    Appuyez sur une touche pour continuer...

@bbdb68
Copy link
Contributor Author

bbdb68 commented Jan 15, 2017

note the "toto" string is an output I added ;-)

@bluca
Copy link
Member

bluca commented Jan 15, 2017

Not sure why the actor is stopping then. If you are able to debug it further and find a fix, PRs are welcome :-)

@bbdb68
Copy link
Contributor Author

bbdb68 commented Jan 17, 2017

and the test does not fail in debug mode, not easy to track. no idea for now.

@bluca
Copy link
Member

bluca commented Jan 17, 2017

Does debug mode disable compiler optimizations on Windows?

@bbdb68
Copy link
Contributor Author

bbdb68 commented Jan 20, 2017

looking at #1608 build result, the only failing test is now this one. The point is it also fails in one of the linux configuration, to this may be not strictly windows related.

@bluca
Copy link
Member

bluca commented Jan 20, 2017

Does it fail on Linux? Where? Haven't seen it consistently fail, but having it always pass on debug mode and never otherwise to me is a strong suggestion that there is some undefined behaviour problem, if debug on Windows removes compiler optimisation

@bbdb68
Copy link
Contributor Author

bbdb68 commented Jan 20, 2017

see https://travis-ci.org/zeromq/czmq/builds/193541925

this is the build of your last commit (for shutdown on each test), it just fails on one linux configuration... in zdir test. I just noticed that, I have no time for more investigation yet.

@bluca
Copy link
Member

bluca commented Jan 20, 2017

That's Mac OS x rather than Linux, but most important it only happens rarely, not consistently
Do you know the answer to the windows debug question? It's the best lead so far, if it's indeed caused by compiler optimizations then it gives a good hint that there's some undefined behaviour in our code

@bbdb68
Copy link
Contributor Author

bbdb68 commented Jan 20, 2017

ah ok. I had no time to investigate yet, I'll try do so.

@bluca
Copy link
Member

bluca commented Jan 20, 2017

Thanks!

@bbdb68
Copy link
Contributor Author

bbdb68 commented Jan 21, 2017

ok I'v got it. The issue is that in windows release mode, NDEBUG flag is set (by CMAKE) and so assert macros are disabled. In the zdir test there are many call within assert, so they are no more executed in release mode...

as an exemple all synchronisation is lost in such calls :
https://github.com/zeromq/czmq/blob/master/src/zdir.c#L981

my understanding is that the fix is two fold:

  • rework the test so that no calls are done within asserts, to make the test more robust and avoid difference in code behaviour with or without NDEBUG.
  • change the appveyor build file so that NDEBUG is disabled since assert is used as a unit testing tool.

I will prepare the PR

@bbdb68
Copy link
Contributor Author

bbdb68 commented Jan 21, 2017

before changing the appveyor file, I propose a PR on zproject so that we can easily know if assert is enabled when running tests

@bluca
Copy link
Member

bluca commented Jan 21, 2017

Great catch! I've been burned by that before (in zproto), should have seen it.

Now the tests are all passing finally:

https://ci.appveyor.com/project/zeromq/czmq

Thank you again!

@bluca bluca closed this as completed Jan 21, 2017
@f70w
Copy link

f70w commented Feb 12, 2017

Not sure why you closed this...

zsock_send (watch, "si", "TIMEOUT", 100);
assert (zsock_wait (watch) == 0);

zsock_send (watch, "ss", "SUBSCRIBE", "zdir-test-dir");
assert (zsock_wait (watch) == 0);

zsock_send (watch, "ss", "UNSUBSCRIBE", "zdir-test-dir");
assert (zsock_wait (watch) == 0);

zsock_send (watch, "ss", "SUBSCRIBE", "zdir-test-dir");
assert (zsock_wait (watch) == 0);

.....

// poll for a certain timeout before giving up and failing the test.
assert (zpoller_wait (watch_poll, 1001) == watch);

the zdir test is still crashing when build as release using vs 2015. I'am not sure whether its related to all the functions calls being removed by the above use of the assert macro or not, since the crash happens when it calls zlist_pop with a null pointer for patches.

@f70w
Copy link

f70w commented Feb 12, 2017

nvm, seems like i had the wrong source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants