Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing driver and worker processes to load different modules #5775

Closed
wants to merge 1 commit into from

Conversation

timholy
Copy link
Member

@timholy timholy commented Feb 12, 2014

On a single machine, I have a driver process that needs to interact with the user; it makes use of several graphics/plotting packages & modules. This driver process delegates its computational work to several worker processes, each of which relies on quite a large number of compute-focused packages & modules. Loading all this code into a single process takes approximately 1 minute, which is a large barrier to entry. But in principle, there's no reason the driver process needs to load the compute packages, and there's no reason the worker processes need to load the graphics packages. So it seems this time could be shaved down quite a bit by some partitioning.

I did some experiments trying to get this working, and with current master the strategy I found seems a bit sub-optimal (this PR may improve matters). The main challenge is that the driver needs to call functions inside modules that it knows nothing about. C permits this by allowing you to declare a function without defining it, but I'm not aware of anything similar in Julia. So the solution seems to be to rely on eval inside of Main.

With current master, here are the various files I needed:

module MyWorker

export compute

compute(x) = x+1

end
module Glue

export evalfetch

evalfetch(r::RemoteRef) = eval(Main, fetch(r))

end
module MyDriver

using Glue

export main

function main(wpid)
    # Load the code that the worker process needs to do its job. 
    # It needs to be evaled in Main to prevent the serializer from
    # complaining about not knowing about MyDriver on wpid
    # (this is not an issue if you execute this from the REPL)
    wait(eval(Main, :(@spawnat $wpid include("MyWorker.jl"))))

    # Now set up the function call, but wrap it in an expression to avoid
    # errors due to not knowing about MyWorker
    rr = RemoteRef(wpid)
    x = 15
    put(rr, :(MyWorker.compute($x)))
    return remotecall_fetch(wpid, evalfetch, rr)
end

end

Then run this as follows (I'm using include to avoid any possible funny business with require loading things on all workers):

wpid = addprocs(1)[1]

@everywhere using Glue

include("MyDriver")
MyDriver.main(wpid)

and you get back 16.

In an ideal world, I'd even put these first two lines inside MyDriver.main(); it seems better not to have to bother the user with spawning new processes and loading code on them. But it seems hard to get Glue into MyDriver after-the-fact.

With this PR, in addition to MyWorker.jl, here's what you need:

module SimpleDriver

export main

function main()
    wpid = addprocs(1)[1]
    wait(eval(Main, :(@spawnat $wpid include("MyWorker.jl"))))

    remotecall_fetch(wpid, :(MyWorker.compute), 15)
end

end
include("SimpleDriver.jl")
SimpleDriver.main()

I don't (yet!) have a good understanding of how multi.jl works, so this should be viewed as a first attempt.

@timholy
Copy link
Member Author

timholy commented Feb 12, 2014

Ah, I did just discover that the following works on master:

module SimpleDriver

export main

function main()
    wpid = addprocs(1)[1]
    wait(eval(Main, :(@spawnat $wpid include("MyWorker.jl"))))
    eval(Main, :(@everywhere compute(x) = MyWorker.compute(x)))
    remotecall_fetch(wpid, Main.compute, 22)
end

end

So this is not necessary. I guess there's still a question of how easy we can/want to make this kind of thing. I can imagine syntax something like this:

MyWorkerRR = @importat 2 MyWorker   # a ModuleRemoteRef
x = MyWorkerRR.compute(22)          # performs the work on 2

@JeffBezanson
Copy link
Member

I think something like @fetchfrom wpid eval(Main,:f)(x) would also work, so I don't think we need to add this.

@timholy
Copy link
Member Author

timholy commented Feb 13, 2014

It's not quite sufficient on its own, because you get errors like these:

julia> MySimpleDriver.main()
fatal error on 2: ERROR: MySimpleDriver not defined
 in deserialize at serialize.jl:333
 in handle_deserialize at serialize.jl:321
 in deserialize at serialize.jl:371
 in handle_deserialize at serialize.jl:321
 in deserialize at serialize.jl:360
 in handle_deserialize at serialize.jl:321
 in anonymous at task.jl:790
Worker 2 terminated.

The problem seems to be that the caller's module becomes part of the serialized data sent to the worker, and that name is not defined on the worker. (This is the same issue that forces you to wrap include inside an eval(Main, ...).)

However, eval(Main, :(@fetchfrom $wpid eval(Main,:(MyWorker.compute))(22))) does work. That's not entirely transparent, so is it worth considering whether we can make that easier to use?

I've come up with one solution that seems promising; I've updated the code and even written some tests (which, naturally, demonstrate how to use this version). To me this seems much easier to use than anything yet considered, so I'm reopening.

@timholy timholy reopened this Feb 13, 2014
@timholy
Copy link
Member Author

timholy commented Feb 13, 2014

The test passed with clang; the gcc error seems to be due to the fact that it didn't even seem to try to get a build started. I didn't find a way to force it to try again.

@carlobaldassi
Copy link
Member

I didn't find a way to force it to try again.

I'd use

git commit --amend --no-edit

which creates a new commit identical to the previous one but with a different sha1 (assuming you didn't git-add anything in the meanwhile, of course). Then force-push the branch, and it should compile again.

@tknopp
Copy link
Contributor

tknopp commented Feb 13, 2014

I don't fully see what is not working but in Julietta I use the approach:

function execute(term::Terminal, cmd::String)
  Base.parse_input_line(cmd)
  s = @fetchfrom term.id begin
    ex = expand(ex)
    value = eval(Main,ex)
    eval(Main, :(ans = $(Expr(:quote, value))))
    sprint(Base.showlimited,value)
  end
end

When I call the execute with execute(terminal,"import REPLCompletions") this works and I can use the REPLCompletions from the worker.

@timholy
Copy link
Member Author

timholy commented Feb 13, 2014

@carlobaldassi, thanks for the tip. I will use that in the future. I think I came up with a better name for the macro anyway.

@tknopp, not sure I fully understand what you're doing, but what I want to do is the following: process A has module AA loaded, and process B has module BB loaded. From inside A, I want to be able to call BB.functionB(args...) to run on process B and return the results to A, without having to load module BB into process A.

@tknopp
Copy link
Contributor

tknopp commented Feb 13, 2014

Ok I see. And actually I am doing the same and now I understand why my "driver side" cannot be in a module (see https://github.com/tknopp/Julietta.jl/blob/master/src/Julietta.jl line 1,2) So it seems that when you remove the module scope from module A it works.

@timholy
Copy link
Member Author

timholy commented Feb 13, 2014

You'd also need to remove the module scope from B as well. And if they have different functions defined in Main, I'm still not sure you can call one from the other, because the serializer needs to be able to know to serialize myfunction as a function.

One thing to note is that your import REPLCompletions may be loading it into all processes, as can be inferred from

julia> addprocs(1)
1-element Array{Any,1}:
 2

julia> import REPLCompletions

julia> reload("REPLCompletions")
Warning: replacing module REPLCompletions
Warning: replacing module REPLCompletions

The fact that there are two warnings is evidence that both know about REPLCompletions.

What I'm shooting for here is the ability to have complete separation. While my motivation was purely to decrease load times, another potential application is that if I have code that I'm not willing to share but willing to let you test-drive, I could open a TCP/IP port on one of my servers and allow you to execute my code remotely.

@tknopp
Copy link
Contributor

tknopp commented Feb 13, 2014

Was not aware of that. I thought the module would be only loaded locally and one would need @anywhere to import them on all processes. So is this a bug or a feature?

@pao
Copy link
Member

pao commented Feb 13, 2014

@tknopp reminder: @anywhere, with backtick quotes to disable the resulting notification.

@timholy
Copy link
Member Author

timholy commented Feb 13, 2014

Mostly a feature, I think. But sometimes you want more fine-grained control.

@JeffBezanson
Copy link
Member

Ah, now I get it. The entire problem is the reference to the enclosing module that the serialized closure has. The closure needs to be "rebased" on top of Main.

Some of the issues can be avoided by avoiding closures. For example remotecall_wait(p, Core.include, filename) should work.

Constructs like @spawnat could "rebase" their closures at compile time so the inner module reference is Main, and any globals inside are fully qualified. That way if nothing from the offending module is actually needed, it will just work.

@timholy
Copy link
Member Author

timholy commented Feb 13, 2014

In usage this is proving more fragile than I expected, so if you know how to do this...

@timholy
Copy link
Member Author

timholy commented Feb 13, 2014

(I keep getting "cannot serialize a pointer" errors, and can't figure out why)

@amitmurthy
Copy link
Contributor

You need to write a custom serialize/deserialize for one of your types maybe? Like this - https://github.com/JuliaLang/julia/pull/5788/files

@vtjnash
Copy link
Member

vtjnash commented Aug 25, 2015

Constructs like @spawnat could "rebase" their closures at compile time so the inner module reference is Main, and any globals inside are fully qualified. That way if nothing from the offending module is actually needed, it will just work.

this was one of the hardest parts of tricking the parallel tests into not depending on being run from Main:
165cabc

@vtjnash
Copy link
Member

vtjnash commented Jun 26, 2019

I think we've now addressed some of these points now (with @everywhere <workers> <expr> for example) and changed the code design of loading

@vtjnash vtjnash closed this Jun 26, 2019
@DilumAluthge DilumAluthge deleted the teh/workermodules branch January 12, 2021 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants