Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initialization for time() not complete in certain cases. #3663

Closed
amitmurthy opened this issue Jul 10, 2013 · 2 comments
Closed

initialization for time() not complete in certain cases. #3663

amitmurthy opened this issue Jul 10, 2013 · 2 comments

Comments

@amitmurthy
Copy link
Contributor

Saw this just once:

   _ _   _| |_  __ _   |  Type "help()" to list help topics
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.2.0-2508.r2d97566e.dirty
 _/ |\__'_|_|_|\__'_|  |  Commit 2d97566e44 2013-07-09 23:05:21*
|__/                   |  x86_64-linux-gnu

julia> addprocs(10)Master process (id 1) could not connect within 60.0 seconds.
exiting.

Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.
Master process (id 1) could not connect within 60.0 seconds.
exiting.

when I did an addprocs(10) after launching julia. All 10 workers exited immediately after the call, presumably because the first call to time() in the below code returned 0

        start = time()
        while !haskey(map_pid_wrkr, 1) && (time() - start) < timeout
            sleep(1.0)
        end

        if !haskey(map_pid_wrkr, 1)
            print(STDERR, "Master process (id 1) could not connect within $timeout seconds.\nexiting.\n")
            exit(1)
        end

Since, it happened to all 10 processes, and is not reproducable, I am guessing it is because something on my Linux system delayed some proper initialization required by time() which is just a ccall to clock_now

@JeffBezanson
Copy link
Member

clock_now just calls gettimeofday. I'm not aware that it's possible for that to return an incorrect value. It doesn't even return any error codes.

@amitmurthy
Copy link
Contributor Author

Could have been because I had not done a make clean for quite some time. Will reopen if I see it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants