Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last function in worker library included with the -L option is not correctly defined when a module is imported #21704

Closed
Chunkulator opened this issue May 5, 2017 · 8 comments

Comments

@Chunkulator
Copy link

BrokenFunction.zip

Loving Julia so far but I keep running into what looks like a bug using the -L option to load a worker library using parallel computing functionality in Julia 0.5.1. This seems to happen on both the Linux and Mac versions of Julia. It seems that if all of the following conditions are met (which they will be for any non-trivial worker library) then the last function definition in the worker library ("junk_function" in the attached example) goes bad in a way that seems to indicate internal badness within Julia somewhere:

  1. The worker library is included using the -L option.
  2. The worker library imports at least one module.

The same worker library included directly from the main julia script using an include statement works fine. Removing the import of the module from the worker script causes junk_function to be correctly defined.

Steps to reproduce with the attached minimal example:

  1. export JULIA_LOAD_PATH=.
  2. julia library.jl (this works fine)
  3. julia -p 2 -L library.jl main.jl (this is broken with the following backtrace)
prompt$ julia -p 2 -L library.jl main.jl
WARNING: replacing module TestModule
WARNING: replacing module TestModule
ERROR: UndefVarError: #junk_function not defined
 in deserialize_datatype(::Base.ClusterSerializer{TCPSocket}) at ./serialize.jl:825
 in handle_deserialize(::Base.ClusterSerializer{TCPSocket}, ::Int32) at ./serialize.jl:571
 in deserialize_msg(::Base.ClusterSerializer{TCPSocket}, ::Type{Base.ResultMsg}) at ./multi.jl:120
 in message_handler_loop(::TCPSocket, ::TCPSocket, ::Bool) at ./multi.jl:1317
 in process_tcp_streams(::TCPSocket, ::TCPSocket, ::Bool) at ./multi.jl:1276
 in (::Base.##638#639{TCPSocket,TCPSocket,Bool})() at ./event.jl:68
 in #remotecall_fetch#626(::Array{Any,1}, ::Function, ::Function, ::Base.Worker, ::String, ::Vararg{String,N}) at ./multi.jl:1070
 in remotecall_fetch(::Function, ::Base.Worker, ::String, ::Vararg{String,N}) at ./multi.jl:1062
 in #remotecall_fetch#629(::Array{Any,1}, ::Function, ::Function, ::Int64, ::String, ::Vararg{String,N}) at ./multi.jl:1080
 in remotecall_fetch(::Function, ::Int64, ::String, ::Vararg{Any,N}) at ./multi.jl:1080
 in (::Base.##946#949{Base.JLOptions})() at ./task.jl:360
 in sync_end() at ./task.jl:311
 in macro expansion at ./task.jl:327 [inlined]
 in process_options(::Base.JLOptions) at ./client.jl:235
 in _start() at ./client.jl:321
 in _start() at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?

Version info:

julia> versioninfo()
Julia Version 0.5.1
Commit 6445c82 (2017-03-05 13:25 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, ivybridge)
@Chunkulator
Copy link
Author

Also, this breakage seems to be identical whether the cluster is local (-p) or remote (--machinefile).

@yuyichao
Copy link
Contributor

yuyichao commented May 5, 2017

Dup of #15996

@yuyichao yuyichao closed this as completed May 5, 2017
@StefanKarpinski
Copy link
Member

@yuyichao: Given that you're closing as a dup of a closed issue with no follow-up discussion, could you suggest a work-around or some other helpful feedback on this problem? I don't really understand the issue myself but since you've encountered it you may be able to explain it.

@yuyichao
Copy link
Contributor

yuyichao commented May 5, 2017

Closing as a dup of closed issue automatically means that it's fixed. I also linked #15996 instead of #15766 directly since I included a workaround already in the report.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented May 5, 2017

There is no indication in that issue that it's fixed, just that it's been closed, which is not the same. So is the status that it is fixed on master but the fix cannot be backported to 0.5?

@yuyichao
Copy link
Contributor

yuyichao commented May 5, 2017

It's closed as a dup of #15766 (comment) I don't think the fix can be backported #15766 (comment).

@StefanKarpinski
Copy link
Member

Ok, thanks for the clarification.

@Chunkulator
Copy link
Author

This is very helpful, thanks. From the linked dup issue I can see that the simple workaround for this problem is to insert nothing; at the bottom of the worker library since the problem has to do with serialising the return value from the include.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants